All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism
@ 2024-04-14  4:02 Wen Gu
  2024-04-14  4:02 ` [PATCH net-next v6 01/11] net/smc: decouple ism_client from SMC-D DMB registration Wen Gu
                   ` (13 more replies)
  0 siblings, 14 replies; 27+ messages in thread
From: Wen Gu @ 2024-04-14  4:02 UTC (permalink / raw
  To: wintera, twinkler, hca, gor, agordeev, davem, edumazet, kuba,
	pabeni, wenjia, jaka
  Cc: borntraeger, svens, alibuda, tonylu, guwen, linux-kernel,
	linux-s390, netdev

This patch set acts as the second part of the new version of [1] (The first
part can be referred from [2]), the updated things of this version are listed
at the end.

- Background

SMC-D is now used in IBM z with ISM function to optimize network interconnect
for intra-CPC communications. Inspired by this, we try to make SMC-D available
on the non-s390 architecture through a software-implemented Emulated-ISM device,
that is the loopback-ism device here, to accelerate inter-process or
inter-containers communication within the same OS instance.

- Design

This patch set includes 3 parts:

 - Patch #1: some prepare work for loopback-ism.
 - Patch #2-#7: implement loopback-ism device and adapt SMC-D for it.
   loopback-ism now serves only SMC and no userspace interfaces exposed.
 - Patch #8-#11: memory copy optimization for intra-OS scenario.

The loopback-ism device is designed as an ISMv2 device and not be limited to
a specific net namespace, ends of both inter-process connection (1/1' in diagram
below) or inter-container connection (2/2' in diagram below) can find the same
available loopback-ism and choose it during the CLC handshake.

 Container 1 (ns1)                              Container 2 (ns2)
 +-----------------------------------------+    +-------------------------+
 | +-------+      +-------+      +-------+ |    |        +-------+        |
 | | App A |      | App B |      | App C | |    |        | App D |<-+     |
 | +-------+      +---^---+      +-------+ |    |        +-------+  |(2') |
 |     |127.0.0.1 (1')|             |192.168.0.11       192.168.0.12|     |
 |  (1)|   +--------+ | +--------+  |(2)   |    | +--------+   +--------+ |
 |     `-->|   lo   |-` |  eth0  |<-`      |    | |   lo   |   |  eth0  | |
 +---------+--|---^-+---+-----|--+---------+    +-+--------+---+-^------+-+
              |   |           |                                  |
 Kernel       |   |           |                                  |
 +----+-------v---+-----------v----------------------------------+---+----+
 |    |                            TCP                               |    |
 |    |                                                              |    |
 |    +--------------------------------------------------------------+    |
 |                                                                        |
 |                           +--------------+                             |
 |                           | smc loopback |                             |
 +---------------------------+--------------+-----------------------------+

loopback-ism device creates DMBs (shared memory) for each connection peer.
Since data transfer occurs within the same kernel, the sndbuf of each peer
is only a descriptor and point to the same memory region as peer DMB, so that
the data copy from sndbuf to peer DMB can be avoided in loopback-ism case.

 Container 1 (ns1)                              Container 2 (ns2)
 +-----------------------------------------+    +-------------------------+
 | +-------+                               |    |        +-------+        |
 | | App C |-----+                         |    |        | App D |        |
 | +-------+     |                         |    |        +-^-----+        |
 |               |                         |    |          |              |
 |           (2) |                         |    |     (2') |              |
 |               |                         |    |          |              |
 +---------------|-------------------------+    +----------|--------------+
                 |                                         |
 Kernel          |                                         |
 +---------------|-----------------------------------------|--------------+
 | +--------+ +--v-----+                           +--------+ +--------+  |
 | |dmb_desc| |snd_desc|                           |dmb_desc| |snd_desc|  |
 | +-----|--+ +--|-----+                           +-----|--+ +--------+  |
 | +-----|--+    |                                 +-----|--+             |
 | | DMB C  |    +---------------------------------| DMB D  |             |
 | +--------+                                      +--------+             |
 |                                                                        |
 |                           +--------------+                             |
 |                           | smc loopback |                             |
 +---------------------------+--------------+-----------------------------+

- Benchmark Test

 * Test environments:
      - VM with Intel Xeon Platinum 8 core 2.50GHz, 16 GiB mem.
      - SMC sndbuf/DMB size 1MB.

 * Test object:
      - TCP: run on TCP loopback.
      - SMC lo: run on SMC loopback-ism.

1. ipc-benchmark (see [3])

 - ./<foo> -c 1000000 -s 100

                            TCP                  SMC-lo
Message
rate (msg/s)              79693                  148236(+86.01%)

2. sockperf

 - serv: <smc_run> sockperf sr --tcp
 - clnt: <smc_run> sockperf { tp | pp } --tcp --msg-size={ 64000 for tp | 14 for pp } -i 127.0.0.1 -t 30

                            TCP                  SMC-lo
Bandwidth(MBps)         4815.18                 8061.77(+67.42%)
Latency(us)               6.176                   3.449(-44.15%)

3. nginx/wrk

 - serv: <smc_run> nginx
 - clnt: <smc_run> wrk -t 8 -c 1000 -d 30 http://127.0.0.1:80

                           TCP                   SMC-lo
Requests/s           196555.02                263270.95(+33.94%)

4. redis-benchmark

 - serv: <smc_run> redis-server
 - clnt: <smc_run> redis-benchmark -h 127.0.0.1 -q -t set,get -n 400000 -c 200 -d 1024

                           TCP                   SMC-lo
GET(Requests/s)       88711.47                120048.02(+35.32%)
SET(Requests/s)       89465.44                123152.71(+37.65%)


Change log:

v6->RFC v5
- Patch #2: make the use of CONFIG_SMC_LO cleaner.
- Patch #5: mark some smcd_ops that loopback-ism doesn't support as
  optional and check for the support when they are called.
- Patch #7: keep loopback-ism at the beginning of the SMC-D device list.
- Some expression changes in commit logs and comments.

RFC v5->RFC v4:
Link: https://lore.kernel.org/netdev/20240324135522.108564-1-guwen@linux.alibaba.com/
- Patch #2: minor changes in description of config SMC_LO and comments.
- Patch #10: minor changes in comments and if(smc_ism_support_dmb_nocopy())
  check in smcd_cdc_msg_send().
- Patch #3: change smc_lo_generate_id() to smc_lo_generate_ids() and SMC_LO_CHID
  to SMC_LO_RESERVED_CHID.
- Patch #5: memcpy while holding the ldev->dmb_ht_lock.
- Some expression changes in commit logs.

RFC v4->v3:
Link: https://lore.kernel.org/netdev/20240317100545.96663-1-guwen@linux.alibaba.com/
- The merge window of v6.9 is open, so post this series as an RFC.
- Patch #6: since some information fed back by smc_nl_handle_smcd_dev() dose
  not apply to Emulated-ISM (including loopback-ism here), loopback-ism is
  not exposed through smc netlink for the time being. we may refactor this
  part when smc netlink interface is updated.

v3->v2:
Link: https://lore.kernel.org/netdev/20240312142743.41406-1-guwen@linux.alibaba.com/
- Patch #11: use tasklet_schedule(&conn->rx_tsklet) instead of smcd_cdc_rx_handler()
  to avoid possible recursive locking of conn->send_lock and use {read|write}_lock_bh()
  to acquire dmb_ht_lock.

v2->v1:
Link: https://lore.kernel.org/netdev/20240307095536.29648-1-guwen@linux.alibaba.com/
- All the patches: changed the term virtual-ISM to Emulated-ISM as defined by SMCv2.1.
- Patch #3: optimized the description of SMC_LO config. Avoid exposing loopback-ism
  to sysfs and remove all the knobs until future definition clear.
- Patch #3: try to make lockdep happy by using read_lock_bh() in smc_lo_move_data().
- Patch #6: defaultly use physical contiguous DMB buffers.
- Patch #11: defaultly enable DMB no-copy for loopback-ism and free the DMB in
  unregister_dmb or detach_dmb when dmb_node->refcnt reaches 0, instead of using
  wait_event to keep waiting in unregister_dmb.

v1->RFC:
Link: https://lore.kernel.org/netdev/20240111120036.109903-1-guwen@linux.alibaba.com/
- Patch #9: merge rx_bytes and tx_bytes as xfer_bytes statistics:
  /sys/devices/virtual/smc/loopback-ism/xfer_bytes
- Patch #10: add support_dmb_nocopy operation to check if SMC-D device supports
  merging sndbuf with peer DMB.
- Patch #13 & #14: introduce loopback-ism device control of DMB memory type and
  control of whether to merge sndbuf and DMB. They can be respectively set by:
  /sys/devices/virtual/smc/loopback-ism/dmb_type
  /sys/devices/virtual/smc/loopback-ism/dmb_copy
  The motivation for these two control is that a performance bottleneck was
  found when using vzalloced DMB and sndbuf is merged with DMB, and there are
  many CPUs and CONFIG_HARDENED_USERCOPY is set [4]. The bottleneck is caused
  by the lock contention in vmap_area_lock [5] which is involved in memcpy_from_msg()
  or memcpy_to_msg(). Currently, Uladzislau Rezki is working on mitigating the
  vmap lock contention [6]. It has significant effects, but using virtual memory
  still has additional overhead compared to using physical memory.
  So this new version provides controls of dmb_type and dmb_copy to suit
  different scenarios.
- Some minor changes and comments improvements.

RFC->old version([1]):
Link: https://lore.kernel.org/netdev/1702214654-32069-1-git-send-email-guwen@linux.alibaba.com/
- Patch #1: improve the loopback-ism dump, it shows as follows now:
  # smcd d
  FID  Type  PCI-ID        PCHID  InUse  #LGs  PNET-ID
  0000 0     loopback-ism  ffff   No        0
- Patch #3: introduce the smc_ism_set_v2_capable() helper and set
  smc_ism_v2_capable when ISMv2 or virtual ISM is registered,
  regardless of whether there is already a device in smcd device list.
- Patch #3: loopback-ism will be added into /sys/devices/virtual/smc/loopback-ism/.
- Patch #8: introduce the runtime switch /sys/devices/virtual/smc/loopback-ism/active
  to activate or deactivate the loopback-ism.
- Patch #9: introduce the statistics of loopback-ism by
  /sys/devices/virtual/smc/loopback-ism/{{tx|rx}_tytes|dmbs_cnt}.
- Some minor changes and comments improvements.

[1] https://lore.kernel.org/netdev/1695568613-125057-1-git-send-email-guwen@linux.alibaba.com/
[2] https://lore.kernel.org/netdev/20231219142616.80697-1-guwen@linux.alibaba.com/
[3] https://github.com/goldsborough/ipc-bench
[4] https://lore.kernel.org/all/3189e342-c38f-6076-b730-19a6efd732a5@linux.alibaba.com/
[5] https://lore.kernel.org/all/238e63cd-e0e8-4fbf-852f-bc4d5bc35d5a@linux.alibaba.com/
[6] https://lore.kernel.org/all/20240102184633.748113-1-urezki@gmail.com/

Wen Gu (11):
  net/smc: decouple ism_client from SMC-D DMB registration
  net/smc: introduce loopback-ism for SMC intra-OS shortcut
  net/smc: implement ID-related operations of loopback-ism
  net/smc: implement DMB-related operations of loopback-ism
  net/smc: mark optional smcd_ops and check for support when called
  net/smc: ignore loopback-ism when dumping SMC-D devices
  net/smc: register loopback-ism into SMC-D device list
  net/smc: add operations to merge sndbuf with peer DMB
  net/smc: {at|de}tach sndbuf to peer DMB if supported
  net/smc: adapt cursor update when sndbuf and peer DMB are merged
  net/smc: implement DMB-merged operations of loopback-ism

 drivers/s390/net/ism_drv.c |   2 +-
 include/net/smc.h          |  21 +-
 net/smc/Kconfig            |  13 ++
 net/smc/Makefile           |   1 +
 net/smc/af_smc.c           |  28 ++-
 net/smc/smc_cdc.c          |  34 ++-
 net/smc/smc_core.c         |  61 +++++-
 net/smc/smc_core.h         |   1 +
 net/smc/smc_ism.c          |  88 ++++++--
 net/smc/smc_ism.h          |  10 +
 net/smc/smc_loopback.c     | 427 +++++++++++++++++++++++++++++++++++++
 net/smc/smc_loopback.h     |  62 ++++++
 12 files changed, 721 insertions(+), 27 deletions(-)
 create mode 100644 net/smc/smc_loopback.c
 create mode 100644 net/smc/smc_loopback.h

-- 
2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH net-next v6 01/11] net/smc: decouple ism_client from SMC-D DMB registration
  2024-04-14  4:02 [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism Wen Gu
@ 2024-04-14  4:02 ` Wen Gu
  2024-04-15  8:41   ` Alexandra Winter
  2024-04-14  4:02 ` [PATCH net-next v6 02/11] net/smc: introduce loopback-ism for SMC intra-OS shortcut Wen Gu
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Wen Gu @ 2024-04-14  4:02 UTC (permalink / raw
  To: wintera, twinkler, hca, gor, agordeev, davem, edumazet, kuba,
	pabeni, wenjia, jaka
  Cc: borntraeger, svens, alibuda, tonylu, guwen, linux-kernel,
	linux-s390, netdev

The struct 'ism_client' is specialized for s390 platform firmware ISM.
So replace it with 'void' to make SMCD DMB registration helper generic
for both Emulated-ISM and existing ISM.

Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 drivers/s390/net/ism_drv.c | 2 +-
 include/net/smc.h          | 4 ++--
 net/smc/smc_ism.c          | 7 ++-----
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
index 2c8e964425dc..9b2a52913e76 100644
--- a/drivers/s390/net/ism_drv.c
+++ b/drivers/s390/net/ism_drv.c
@@ -726,7 +726,7 @@ static int smcd_query_rgid(struct smcd_dev *smcd, struct smcd_gid *rgid,
 }
 
 static int smcd_register_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb,
-			     struct ism_client *client)
+			     void *client)
 {
 	return ism_register_dmb(smcd->priv, (struct ism_dmb *)dmb, client);
 }
diff --git a/include/net/smc.h b/include/net/smc.h
index 10684d0a33df..542d12372c18 100644
--- a/include/net/smc.h
+++ b/include/net/smc.h
@@ -47,7 +47,6 @@ struct smcd_dmb {
 #define ISM_ERROR	0xFFFF
 
 struct smcd_dev;
-struct ism_client;
 
 struct smcd_gid {
 	u64	gid;
@@ -58,7 +57,7 @@ struct smcd_ops {
 	int (*query_remote_gid)(struct smcd_dev *dev, struct smcd_gid *rgid,
 				u32 vid_valid, u32 vid);
 	int (*register_dmb)(struct smcd_dev *dev, struct smcd_dmb *dmb,
-			    struct ism_client *client);
+			    void *client);
 	int (*unregister_dmb)(struct smcd_dev *dev, struct smcd_dmb *dmb);
 	int (*add_vlan_id)(struct smcd_dev *dev, u64 vlan_id);
 	int (*del_vlan_id)(struct smcd_dev *dev, u64 vlan_id);
@@ -78,6 +77,7 @@ struct smcd_ops {
 struct smcd_dev {
 	const struct smcd_ops *ops;
 	void *priv;
+	void *client;
 	struct list_head list;
 	spinlock_t lock;
 	struct smc_connection **conn;
diff --git a/net/smc/smc_ism.c b/net/smc/smc_ism.c
index ac88de2a06a0..051726586730 100644
--- a/net/smc/smc_ism.c
+++ b/net/smc/smc_ism.c
@@ -222,7 +222,6 @@ int smc_ism_unregister_dmb(struct smcd_dev *smcd, struct smc_buf_desc *dmb_desc)
 int smc_ism_register_dmb(struct smc_link_group *lgr, int dmb_len,
 			 struct smc_buf_desc *dmb_desc)
 {
-#if IS_ENABLED(CONFIG_ISM)
 	struct smcd_dmb dmb;
 	int rc;
 
@@ -231,7 +230,7 @@ int smc_ism_register_dmb(struct smc_link_group *lgr, int dmb_len,
 	dmb.sba_idx = dmb_desc->sba_idx;
 	dmb.vlan_id = lgr->vlan_id;
 	dmb.rgid = lgr->peer_gid.gid;
-	rc = lgr->smcd->ops->register_dmb(lgr->smcd, &dmb, &smc_ism_client);
+	rc = lgr->smcd->ops->register_dmb(lgr->smcd, &dmb, lgr->smcd->client);
 	if (!rc) {
 		dmb_desc->sba_idx = dmb.sba_idx;
 		dmb_desc->token = dmb.dmb_tok;
@@ -240,9 +239,6 @@ int smc_ism_register_dmb(struct smc_link_group *lgr, int dmb_len,
 		dmb_desc->len = dmb.dmb_len;
 	}
 	return rc;
-#else
-	return 0;
-#endif
 }
 
 static int smc_nl_handle_smcd_dev(struct smcd_dev *smcd,
@@ -446,6 +442,7 @@ static void smcd_register_dev(struct ism_dev *ism)
 	if (!smcd)
 		return;
 	smcd->priv = ism;
+	smcd->client = &smc_ism_client;
 	ism_set_priv(ism, &smc_ism_client, smcd);
 	if (smc_pnetid_by_dev_port(&ism->pdev->dev, 0, smcd->pnetid))
 		smc_pnetid_by_table_smcd(smcd);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v6 02/11] net/smc: introduce loopback-ism for SMC intra-OS shortcut
  2024-04-14  4:02 [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism Wen Gu
  2024-04-14  4:02 ` [PATCH net-next v6 01/11] net/smc: decouple ism_client from SMC-D DMB registration Wen Gu
@ 2024-04-14  4:02 ` Wen Gu
  2024-04-17 10:20   ` Gerd Bayer
  2024-04-14  4:02 ` [PATCH net-next v6 03/11] net/smc: implement ID-related operations of loopback-ism Wen Gu
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Wen Gu @ 2024-04-14  4:02 UTC (permalink / raw
  To: wintera, twinkler, hca, gor, agordeev, davem, edumazet, kuba,
	pabeni, wenjia, jaka
  Cc: borntraeger, svens, alibuda, tonylu, guwen, linux-kernel,
	linux-s390, netdev

This introduces a kind of Emulated-ISM device named loopback-ism for
SMCv2.1. The loopback-ism device is currently exclusive for SMC usage,
and aims to provide an SMC shortcut for sockets within the same kernel,
leading to improved intra-OS traffic performance. Configuration of this
feature is managed through the config SMC_LO.

Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 net/smc/Kconfig        |  13 ++++
 net/smc/Makefile       |   1 +
 net/smc/af_smc.c       |  12 +++-
 net/smc/smc_loopback.c | 156 +++++++++++++++++++++++++++++++++++++++++
 net/smc/smc_loopback.h |  43 ++++++++++++
 5 files changed, 224 insertions(+), 1 deletion(-)
 create mode 100644 net/smc/smc_loopback.c
 create mode 100644 net/smc/smc_loopback.h

diff --git a/net/smc/Kconfig b/net/smc/Kconfig
index 746be3996768..ba5e6a2dd2fd 100644
--- a/net/smc/Kconfig
+++ b/net/smc/Kconfig
@@ -20,3 +20,16 @@ config SMC_DIAG
 	  smcss.
 
 	  if unsure, say Y.
+
+config SMC_LO
+	bool "SMC intra-OS shortcut with loopback-ism"
+	depends on SMC
+	default n
+	help
+	  SMC_LO enables the creation of an Emulated-ISM device named
+	  loopback-ism in SMC and makes use of it for transferring data
+	  when communication occurs within the same OS. This helps in
+	  convenient testing of SMC-D since loopback-ism is independent
+	  of architecture or hardware.
+
+	  if unsure, say N.
diff --git a/net/smc/Makefile b/net/smc/Makefile
index 875efcd126a2..2c510d543058 100644
--- a/net/smc/Makefile
+++ b/net/smc/Makefile
@@ -6,3 +6,4 @@ smc-y := af_smc.o smc_pnet.o smc_ib.o smc_clc.o smc_core.o smc_wr.o smc_llc.o
 smc-y += smc_cdc.o smc_tx.o smc_rx.o smc_close.o smc_ism.o smc_netlink.o smc_stats.o
 smc-y += smc_tracepoint.o
 smc-$(CONFIG_SYSCTL) += smc_sysctl.o
+smc-$(CONFIG_SMC_LO) += smc_loopback.o
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index e8dcd28a554c..47f3bc1470bc 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -53,6 +53,7 @@
 #include "smc_stats.h"
 #include "smc_tracepoint.h"
 #include "smc_sysctl.h"
+#include "smc_loopback.h"
 
 static DEFINE_MUTEX(smc_server_lgr_pending);	/* serialize link group
 						 * creation on server
@@ -3555,15 +3556,23 @@ static int __init smc_init(void)
 		goto out_sock;
 	}
 
+	rc = smc_loopback_init();
+	if (rc) {
+		pr_err("%s: smc_loopback_init fails with %d\n", __func__, rc);
+		goto out_ib;
+	}
+
 	rc = tcp_register_ulp(&smc_ulp_ops);
 	if (rc) {
 		pr_err("%s: tcp_ulp_register fails with %d\n", __func__, rc);
-		goto out_ib;
+		goto out_lo;
 	}
 
 	static_branch_enable(&tcp_have_smc);
 	return 0;
 
+out_lo:
+	smc_loopback_exit();
 out_ib:
 	smc_ib_unregister_client();
 out_sock:
@@ -3601,6 +3610,7 @@ static void __exit smc_exit(void)
 	tcp_unregister_ulp(&smc_ulp_ops);
 	sock_unregister(PF_SMC);
 	smc_core_exit();
+	smc_loopback_exit();
 	smc_ib_unregister_client();
 	smc_ism_exit();
 	destroy_workqueue(smc_close_wq);
diff --git a/net/smc/smc_loopback.c b/net/smc/smc_loopback.c
new file mode 100644
index 000000000000..c364e3e6e3fb
--- /dev/null
+++ b/net/smc/smc_loopback.c
@@ -0,0 +1,156 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ *  Shared Memory Communications Direct over loopback-ism device.
+ *
+ *  Functions for loopback-ism device.
+ *
+ *  Copyright (c) 2024, Alibaba Inc.
+ *
+ *  Author: Wen Gu <guwen@linux.alibaba.com>
+ *          Tony Lu <tonylu@linux.alibaba.com>
+ *
+ */
+
+#include <linux/device.h>
+#include <linux/types.h>
+#include <net/smc.h>
+
+#include "smc_ism.h"
+#include "smc_loopback.h"
+
+static const char smc_lo_dev_name[] = "loopback-ism";
+static struct smc_lo_dev *lo_dev;
+
+static const struct smcd_ops lo_ops = {
+	.query_remote_gid	= NULL,
+	.register_dmb		= NULL,
+	.unregister_dmb		= NULL,
+	.add_vlan_id		= NULL,
+	.del_vlan_id		= NULL,
+	.set_vlan_required	= NULL,
+	.reset_vlan_required	= NULL,
+	.signal_event		= NULL,
+	.move_data		= NULL,
+	.supports_v2		= NULL,
+	.get_local_gid		= NULL,
+	.get_chid		= NULL,
+	.get_dev		= NULL,
+};
+
+static struct smcd_dev *smcd_lo_alloc_dev(const struct smcd_ops *ops,
+					  int max_dmbs)
+{
+	struct smcd_dev *smcd;
+
+	smcd = kzalloc(sizeof(*smcd), GFP_KERNEL);
+	if (!smcd)
+		return NULL;
+
+	smcd->conn = kcalloc(max_dmbs, sizeof(struct smc_connection *),
+			     GFP_KERNEL);
+	if (!smcd->conn)
+		goto out_smcd;
+
+	smcd->ops = ops;
+
+	spin_lock_init(&smcd->lock);
+	spin_lock_init(&smcd->lgr_lock);
+	INIT_LIST_HEAD(&smcd->vlan);
+	INIT_LIST_HEAD(&smcd->lgr_list);
+	init_waitqueue_head(&smcd->lgrs_deleted);
+	return smcd;
+
+out_smcd:
+	kfree(smcd);
+	return NULL;
+}
+
+static int smcd_lo_register_dev(struct smc_lo_dev *ldev)
+{
+	struct smcd_dev *smcd;
+
+	smcd = smcd_lo_alloc_dev(&lo_ops, SMC_LO_MAX_DMBS);
+	if (!smcd)
+		return -ENOMEM;
+	ldev->smcd = smcd;
+	smcd->priv = ldev;
+
+	/* TODO:
+	 * register loopback-ism to smcd_dev list.
+	 */
+	return 0;
+}
+
+static void smcd_lo_unregister_dev(struct smc_lo_dev *ldev)
+{
+	struct smcd_dev *smcd = ldev->smcd;
+
+	/* TODO:
+	 * unregister loopback-ism from smcd_dev list.
+	 */
+	kfree(smcd->conn);
+	kfree(smcd);
+}
+
+static int smc_lo_dev_init(struct smc_lo_dev *ldev)
+{
+	return smcd_lo_register_dev(ldev);
+}
+
+static void smc_lo_dev_exit(struct smc_lo_dev *ldev)
+{
+	smcd_lo_unregister_dev(ldev);
+}
+
+static void smc_lo_dev_release(struct device *dev)
+{
+	struct smc_lo_dev *ldev =
+		container_of(dev, struct smc_lo_dev, dev);
+
+	kfree(ldev);
+}
+
+static int smc_lo_dev_probe(void)
+{
+	struct smc_lo_dev *ldev;
+	int ret;
+
+	ldev = kzalloc(sizeof(*ldev), GFP_KERNEL);
+	if (!ldev)
+		return -ENOMEM;
+
+	ldev->dev.parent = NULL;
+	ldev->dev.release = smc_lo_dev_release;
+	device_initialize(&ldev->dev);
+	dev_set_name(&ldev->dev, smc_lo_dev_name);
+
+	ret = smc_lo_dev_init(ldev);
+	if (ret)
+		goto free_dev;
+
+	lo_dev = ldev; /* global loopback device */
+	return 0;
+
+free_dev:
+	put_device(&ldev->dev);
+	return ret;
+}
+
+static void smc_lo_dev_remove(void)
+{
+	if (!lo_dev)
+		return;
+
+	smc_lo_dev_exit(lo_dev);
+	put_device(&lo_dev->dev); /* device_initialize in smc_lo_dev_probe */
+}
+
+int smc_loopback_init(void)
+{
+	return smc_lo_dev_probe();
+}
+
+void smc_loopback_exit(void)
+{
+	smc_lo_dev_remove();
+}
diff --git a/net/smc/smc_loopback.h b/net/smc/smc_loopback.h
new file mode 100644
index 000000000000..c6c97e2c461d
--- /dev/null
+++ b/net/smc/smc_loopback.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ *  Shared Memory Communications Direct over loopback-ism device.
+ *
+ *  SMC-D loopback-ism device structure definitions.
+ *
+ *  Copyright (c) 2024, Alibaba Inc.
+ *
+ *  Author: Wen Gu <guwen@linux.alibaba.com>
+ *          Tony Lu <tonylu@linux.alibaba.com>
+ *
+ */
+
+#ifndef _SMC_LOOPBACK_H
+#define _SMC_LOOPBACK_H
+
+#include <linux/device.h>
+#include <linux/err.h>
+#include <net/smc.h>
+
+#if IS_ENABLED(CONFIG_SMC_LO)
+#define SMC_LO_MAX_DMBS		5000
+
+struct smc_lo_dev {
+	struct smcd_dev *smcd;
+	struct device dev;
+};
+
+int smc_loopback_init(void);
+void smc_loopback_exit(void);
+#else
+static inline int smc_loopback_init(void)
+{
+	return 0;
+}
+
+static inline void smc_loopback_exit(void)
+{
+	return;
+}
+#endif
+
+#endif /* _SMC_LOOPBACK_H */
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v6 03/11] net/smc: implement ID-related operations of loopback-ism
  2024-04-14  4:02 [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism Wen Gu
  2024-04-14  4:02 ` [PATCH net-next v6 01/11] net/smc: decouple ism_client from SMC-D DMB registration Wen Gu
  2024-04-14  4:02 ` [PATCH net-next v6 02/11] net/smc: introduce loopback-ism for SMC intra-OS shortcut Wen Gu
@ 2024-04-14  4:02 ` Wen Gu
  2024-04-14  4:02 ` [PATCH net-next v6 04/11] net/smc: implement DMB-related " Wen Gu
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Wen Gu @ 2024-04-14  4:02 UTC (permalink / raw
  To: wintera, twinkler, hca, gor, agordeev, davem, edumazet, kuba,
	pabeni, wenjia, jaka
  Cc: borntraeger, svens, alibuda, tonylu, guwen, linux-kernel,
	linux-s390, netdev

This implements operations related to IDs for the loopback-ism device.
loopback-ism uses an Extended GID that is a 128-bit GID instead of the
existing ISM 64-bit GID, and uses the CHID defined with the reserved
value 0xFFFF.

Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 net/smc/smc_loopback.c | 62 ++++++++++++++++++++++++++++++++++++++----
 net/smc/smc_loopback.h |  3 ++
 2 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/net/smc/smc_loopback.c b/net/smc/smc_loopback.c
index c364e3e6e3fb..0349632a76c4 100644
--- a/net/smc/smc_loopback.c
+++ b/net/smc/smc_loopback.c
@@ -18,11 +18,62 @@
 #include "smc_ism.h"
 #include "smc_loopback.h"
 
+#define SMC_LO_V2_CAPABLE	0x1 /* loopback-ism acts as ISMv2 */
+
 static const char smc_lo_dev_name[] = "loopback-ism";
 static struct smc_lo_dev *lo_dev;
 
+static void smc_lo_generate_ids(struct smc_lo_dev *ldev)
+{
+	struct smcd_gid *lgid = &ldev->local_gid;
+	uuid_t uuid;
+
+	uuid_gen(&uuid);
+	memcpy(&lgid->gid, &uuid, sizeof(lgid->gid));
+	memcpy(&lgid->gid_ext, (u8 *)&uuid + sizeof(lgid->gid),
+	       sizeof(lgid->gid_ext));
+
+	ldev->chid = SMC_LO_RESERVED_CHID;
+}
+
+static int smc_lo_query_rgid(struct smcd_dev *smcd, struct smcd_gid *rgid,
+			     u32 vid_valid, u32 vid)
+{
+	struct smc_lo_dev *ldev = smcd->priv;
+
+	/* rgid should be the same as lgid */
+	if (!ldev || rgid->gid != ldev->local_gid.gid ||
+	    rgid->gid_ext != ldev->local_gid.gid_ext)
+		return -ENETUNREACH;
+	return 0;
+}
+
+static int smc_lo_supports_v2(void)
+{
+	return SMC_LO_V2_CAPABLE;
+}
+
+static void smc_lo_get_local_gid(struct smcd_dev *smcd,
+				 struct smcd_gid *smcd_gid)
+{
+	struct smc_lo_dev *ldev = smcd->priv;
+
+	smcd_gid->gid = ldev->local_gid.gid;
+	smcd_gid->gid_ext = ldev->local_gid.gid_ext;
+}
+
+static u16 smc_lo_get_chid(struct smcd_dev *smcd)
+{
+	return ((struct smc_lo_dev *)smcd->priv)->chid;
+}
+
+static struct device *smc_lo_get_dev(struct smcd_dev *smcd)
+{
+	return &((struct smc_lo_dev *)smcd->priv)->dev;
+}
+
 static const struct smcd_ops lo_ops = {
-	.query_remote_gid	= NULL,
+	.query_remote_gid = smc_lo_query_rgid,
 	.register_dmb		= NULL,
 	.unregister_dmb		= NULL,
 	.add_vlan_id		= NULL,
@@ -31,10 +82,10 @@ static const struct smcd_ops lo_ops = {
 	.reset_vlan_required	= NULL,
 	.signal_event		= NULL,
 	.move_data		= NULL,
-	.supports_v2		= NULL,
-	.get_local_gid		= NULL,
-	.get_chid		= NULL,
-	.get_dev		= NULL,
+	.supports_v2 = smc_lo_supports_v2,
+	.get_local_gid = smc_lo_get_local_gid,
+	.get_chid = smc_lo_get_chid,
+	.get_dev = smc_lo_get_dev,
 };
 
 static struct smcd_dev *smcd_lo_alloc_dev(const struct smcd_ops *ops,
@@ -94,6 +145,7 @@ static void smcd_lo_unregister_dev(struct smc_lo_dev *ldev)
 
 static int smc_lo_dev_init(struct smc_lo_dev *ldev)
 {
+	smc_lo_generate_ids(ldev);
 	return smcd_lo_register_dev(ldev);
 }
 
diff --git a/net/smc/smc_loopback.h b/net/smc/smc_loopback.h
index c6c97e2c461d..c11529b15041 100644
--- a/net/smc/smc_loopback.h
+++ b/net/smc/smc_loopback.h
@@ -20,10 +20,13 @@
 
 #if IS_ENABLED(CONFIG_SMC_LO)
 #define SMC_LO_MAX_DMBS		5000
+#define SMC_LO_RESERVED_CHID	0xFFFF
 
 struct smc_lo_dev {
 	struct smcd_dev *smcd;
 	struct device dev;
+	u16 chid;
+	struct smcd_gid local_gid;
 };
 
 int smc_loopback_init(void);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v6 04/11] net/smc: implement DMB-related operations of loopback-ism
  2024-04-14  4:02 [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism Wen Gu
                   ` (2 preceding siblings ...)
  2024-04-14  4:02 ` [PATCH net-next v6 03/11] net/smc: implement ID-related operations of loopback-ism Wen Gu
@ 2024-04-14  4:02 ` Wen Gu
  2024-04-14  4:02 ` [PATCH net-next v6 05/11] net/smc: mark optional smcd_ops and check for support when called Wen Gu
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Wen Gu @ 2024-04-14  4:02 UTC (permalink / raw
  To: wintera, twinkler, hca, gor, agordeev, davem, edumazet, kuba,
	pabeni, wenjia, jaka
  Cc: borntraeger, svens, alibuda, tonylu, guwen, linux-kernel,
	linux-s390, netdev

This implements DMB (un)registration and data move operations of
loopback-ism device.

Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 net/smc/smc_loopback.c | 129 ++++++++++++++++++++++++++++++++++++++++-
 net/smc/smc_loopback.h |  13 +++++
 2 files changed, 139 insertions(+), 3 deletions(-)

diff --git a/net/smc/smc_loopback.c b/net/smc/smc_loopback.c
index 0349632a76c4..d75852549b00 100644
--- a/net/smc/smc_loopback.c
+++ b/net/smc/smc_loopback.c
@@ -15,10 +15,12 @@
 #include <linux/types.h>
 #include <net/smc.h>
 
+#include "smc_cdc.h"
 #include "smc_ism.h"
 #include "smc_loopback.h"
 
 #define SMC_LO_V2_CAPABLE	0x1 /* loopback-ism acts as ISMv2 */
+#define SMC_DMA_ADDR_INVALID	(~(dma_addr_t)0)
 
 static const char smc_lo_dev_name[] = "loopback-ism";
 static struct smc_lo_dev *lo_dev;
@@ -48,6 +50,125 @@ static int smc_lo_query_rgid(struct smcd_dev *smcd, struct smcd_gid *rgid,
 	return 0;
 }
 
+static int smc_lo_register_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb,
+			       void *client_priv)
+{
+	struct smc_lo_dmb_node *dmb_node, *tmp_node;
+	struct smc_lo_dev *ldev = smcd->priv;
+	int sba_idx, rc;
+
+	/* check space for new dmb */
+	for_each_clear_bit(sba_idx, ldev->sba_idx_mask, SMC_LO_MAX_DMBS) {
+		if (!test_and_set_bit(sba_idx, ldev->sba_idx_mask))
+			break;
+	}
+	if (sba_idx == SMC_LO_MAX_DMBS)
+		return -ENOSPC;
+
+	dmb_node = kzalloc(sizeof(*dmb_node), GFP_KERNEL);
+	if (!dmb_node) {
+		rc = -ENOMEM;
+		goto err_bit;
+	}
+
+	dmb_node->sba_idx = sba_idx;
+	dmb_node->len = dmb->dmb_len;
+	dmb_node->cpu_addr = kzalloc(dmb_node->len, GFP_KERNEL |
+				     __GFP_NOWARN | __GFP_NORETRY |
+				     __GFP_NOMEMALLOC);
+	if (!dmb_node->cpu_addr) {
+		rc = -ENOMEM;
+		goto err_node;
+	}
+	dmb_node->dma_addr = SMC_DMA_ADDR_INVALID;
+
+again:
+	/* add new dmb into hash table */
+	get_random_bytes(&dmb_node->token, sizeof(dmb_node->token));
+	write_lock_bh(&ldev->dmb_ht_lock);
+	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb_node->token) {
+		if (tmp_node->token == dmb_node->token) {
+			write_unlock_bh(&ldev->dmb_ht_lock);
+			goto again;
+		}
+	}
+	hash_add(ldev->dmb_ht, &dmb_node->list, dmb_node->token);
+	write_unlock_bh(&ldev->dmb_ht_lock);
+
+	dmb->sba_idx = dmb_node->sba_idx;
+	dmb->dmb_tok = dmb_node->token;
+	dmb->cpu_addr = dmb_node->cpu_addr;
+	dmb->dma_addr = dmb_node->dma_addr;
+	dmb->dmb_len = dmb_node->len;
+
+	return 0;
+
+err_node:
+	kfree(dmb_node);
+err_bit:
+	clear_bit(sba_idx, ldev->sba_idx_mask);
+	return rc;
+}
+
+static int smc_lo_unregister_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb)
+{
+	struct smc_lo_dmb_node *dmb_node = NULL, *tmp_node;
+	struct smc_lo_dev *ldev = smcd->priv;
+
+	/* remove dmb from hash table */
+	write_lock_bh(&ldev->dmb_ht_lock);
+	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb->dmb_tok) {
+		if (tmp_node->token == dmb->dmb_tok) {
+			dmb_node = tmp_node;
+			break;
+		}
+	}
+	if (!dmb_node) {
+		write_unlock_bh(&ldev->dmb_ht_lock);
+		return -EINVAL;
+	}
+	hash_del(&dmb_node->list);
+	write_unlock_bh(&ldev->dmb_ht_lock);
+
+	clear_bit(dmb_node->sba_idx, ldev->sba_idx_mask);
+	kfree(dmb_node->cpu_addr);
+	kfree(dmb_node);
+
+	return 0;
+}
+
+static int smc_lo_move_data(struct smcd_dev *smcd, u64 dmb_tok,
+			    unsigned int idx, bool sf, unsigned int offset,
+			    void *data, unsigned int size)
+{
+	struct smc_lo_dmb_node *rmb_node = NULL, *tmp_node;
+	struct smc_lo_dev *ldev = smcd->priv;
+	struct smc_connection *conn;
+
+	read_lock_bh(&ldev->dmb_ht_lock);
+	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb_tok) {
+		if (tmp_node->token == dmb_tok) {
+			rmb_node = tmp_node;
+			break;
+		}
+	}
+	if (!rmb_node) {
+		read_unlock_bh(&ldev->dmb_ht_lock);
+		return -EINVAL;
+	}
+	memcpy((char *)rmb_node->cpu_addr + offset, data, size);
+	read_unlock_bh(&ldev->dmb_ht_lock);
+
+	if (sf) {
+		conn = smcd->conn[rmb_node->sba_idx];
+		if (conn && !conn->killed)
+			tasklet_schedule(&conn->rx_tsklet);
+		else
+			return -EPIPE;
+	}
+	return 0;
+}
+
 static int smc_lo_supports_v2(void)
 {
 	return SMC_LO_V2_CAPABLE;
@@ -74,14 +195,14 @@ static struct device *smc_lo_get_dev(struct smcd_dev *smcd)
 
 static const struct smcd_ops lo_ops = {
 	.query_remote_gid = smc_lo_query_rgid,
-	.register_dmb		= NULL,
-	.unregister_dmb		= NULL,
+	.register_dmb = smc_lo_register_dmb,
+	.unregister_dmb = smc_lo_unregister_dmb,
 	.add_vlan_id		= NULL,
 	.del_vlan_id		= NULL,
 	.set_vlan_required	= NULL,
 	.reset_vlan_required	= NULL,
 	.signal_event		= NULL,
-	.move_data		= NULL,
+	.move_data = smc_lo_move_data,
 	.supports_v2 = smc_lo_supports_v2,
 	.get_local_gid = smc_lo_get_local_gid,
 	.get_chid = smc_lo_get_chid,
@@ -146,6 +267,8 @@ static void smcd_lo_unregister_dev(struct smc_lo_dev *ldev)
 static int smc_lo_dev_init(struct smc_lo_dev *ldev)
 {
 	smc_lo_generate_ids(ldev);
+	rwlock_init(&ldev->dmb_ht_lock);
+	hash_init(ldev->dmb_ht);
 	return smcd_lo_register_dev(ldev);
 }
 
diff --git a/net/smc/smc_loopback.h b/net/smc/smc_loopback.h
index c11529b15041..4b36acbc27da 100644
--- a/net/smc/smc_loopback.h
+++ b/net/smc/smc_loopback.h
@@ -20,13 +20,26 @@
 
 #if IS_ENABLED(CONFIG_SMC_LO)
 #define SMC_LO_MAX_DMBS		5000
+#define SMC_LO_DMBS_HASH_BITS	12
 #define SMC_LO_RESERVED_CHID	0xFFFF
 
+struct smc_lo_dmb_node {
+	struct hlist_node list;
+	u64 token;
+	u32 len;
+	u32 sba_idx;
+	void *cpu_addr;
+	dma_addr_t dma_addr;
+};
+
 struct smc_lo_dev {
 	struct smcd_dev *smcd;
 	struct device dev;
 	u16 chid;
 	struct smcd_gid local_gid;
+	rwlock_t dmb_ht_lock;
+	DECLARE_BITMAP(sba_idx_mask, SMC_LO_MAX_DMBS);
+	DECLARE_HASHTABLE(dmb_ht, SMC_LO_DMBS_HASH_BITS);
 };
 
 int smc_loopback_init(void);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v6 05/11] net/smc: mark optional smcd_ops and check for support when called
  2024-04-14  4:02 [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism Wen Gu
                   ` (3 preceding siblings ...)
  2024-04-14  4:02 ` [PATCH net-next v6 04/11] net/smc: implement DMB-related " Wen Gu
@ 2024-04-14  4:02 ` Wen Gu
  2024-04-14  4:02 ` [PATCH net-next v6 06/11] net/smc: ignore loopback-ism when dumping SMC-D devices Wen Gu
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Wen Gu @ 2024-04-14  4:02 UTC (permalink / raw
  To: wintera, twinkler, hca, gor, agordeev, davem, edumazet, kuba,
	pabeni, wenjia, jaka
  Cc: borntraeger, svens, alibuda, tonylu, guwen, linux-kernel,
	linux-s390, netdev

Some operations are not supported by new introduced Emulated-ISM, so
mark them as optional and check if the device supports them when called.

Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 include/net/smc.h | 14 ++++++++------
 net/smc/smc_ism.c |  9 ++++++++-
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/include/net/smc.h b/include/net/smc.h
index 542d12372c18..33b753115e43 100644
--- a/include/net/smc.h
+++ b/include/net/smc.h
@@ -59,12 +59,6 @@ struct smcd_ops {
 	int (*register_dmb)(struct smcd_dev *dev, struct smcd_dmb *dmb,
 			    void *client);
 	int (*unregister_dmb)(struct smcd_dev *dev, struct smcd_dmb *dmb);
-	int (*add_vlan_id)(struct smcd_dev *dev, u64 vlan_id);
-	int (*del_vlan_id)(struct smcd_dev *dev, u64 vlan_id);
-	int (*set_vlan_required)(struct smcd_dev *dev);
-	int (*reset_vlan_required)(struct smcd_dev *dev);
-	int (*signal_event)(struct smcd_dev *dev, struct smcd_gid *rgid,
-			    u32 trigger_irq, u32 event_code, u64 info);
 	int (*move_data)(struct smcd_dev *dev, u64 dmb_tok, unsigned int idx,
 			 bool sf, unsigned int offset, void *data,
 			 unsigned int size);
@@ -72,6 +66,14 @@ struct smcd_ops {
 	void (*get_local_gid)(struct smcd_dev *dev, struct smcd_gid *gid);
 	u16 (*get_chid)(struct smcd_dev *dev);
 	struct device* (*get_dev)(struct smcd_dev *dev);
+
+	/* optional operations */
+	int (*add_vlan_id)(struct smcd_dev *dev, u64 vlan_id);
+	int (*del_vlan_id)(struct smcd_dev *dev, u64 vlan_id);
+	int (*set_vlan_required)(struct smcd_dev *dev);
+	int (*reset_vlan_required)(struct smcd_dev *dev);
+	int (*signal_event)(struct smcd_dev *dev, struct smcd_gid *rgid,
+			    u32 trigger_irq, u32 event_code, u64 info);
 };
 
 struct smcd_dev {
diff --git a/net/smc/smc_ism.c b/net/smc/smc_ism.c
index 051726586730..36459a3c739d 100644
--- a/net/smc/smc_ism.c
+++ b/net/smc/smc_ism.c
@@ -126,6 +126,8 @@ int smc_ism_get_vlan(struct smcd_dev *smcd, unsigned short vlanid)
 
 	if (!vlanid)			/* No valid vlan id */
 		return -EINVAL;
+	if (!smcd->ops->add_vlan_id)
+		return -EOPNOTSUPP;
 
 	/* create new vlan entry, in case we need it */
 	new_vlan = kzalloc(sizeof(*new_vlan), GFP_KERNEL);
@@ -171,6 +173,8 @@ int smc_ism_put_vlan(struct smcd_dev *smcd, unsigned short vlanid)
 
 	if (!vlanid)			/* No valid vlan id */
 		return -EINVAL;
+	if (!smcd->ops->del_vlan_id)
+		return -EOPNOTSUPP;
 
 	spin_lock_irqsave(&smcd->lock, flags);
 	list_for_each_entry(vlan, &smcd->vlan, list) {
@@ -368,7 +372,8 @@ static void smcd_handle_sw_event(struct smc_ism_event_work *wrk)
 		smc_smcd_terminate(wrk->smcd, &peer_gid, ev_info.vlan_id);
 		break;
 	case ISM_EVENT_CODE_TESTLINK:	/* Activity timer */
-		if (ev_info.code == ISM_EVENT_REQUEST) {
+		if (ev_info.code == ISM_EVENT_REQUEST &&
+		    wrk->smcd->ops->signal_event) {
 			ev_info.code = ISM_EVENT_RESPONSE;
 			wrk->smcd->ops->signal_event(wrk->smcd,
 						     &peer_gid,
@@ -538,6 +543,8 @@ int smc_ism_signal_shutdown(struct smc_link_group *lgr)
 
 	if (lgr->peer_shutdown)
 		return 0;
+	if (!lgr->smcd->ops->signal_event)
+		return 0;
 
 	memcpy(ev_info.uid, lgr->id, SMC_LGR_ID_SIZE);
 	ev_info.vlan_id = lgr->vlan_id;
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v6 06/11] net/smc: ignore loopback-ism when dumping SMC-D devices
  2024-04-14  4:02 [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism Wen Gu
                   ` (4 preceding siblings ...)
  2024-04-14  4:02 ` [PATCH net-next v6 05/11] net/smc: mark optional smcd_ops and check for support when called Wen Gu
@ 2024-04-14  4:02 ` Wen Gu
  2024-04-14  4:03 ` [PATCH net-next v6 07/11] net/smc: register loopback-ism into SMC-D device list Wen Gu
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Wen Gu @ 2024-04-14  4:02 UTC (permalink / raw
  To: wintera, twinkler, hca, gor, agordeev, davem, edumazet, kuba,
	pabeni, wenjia, jaka
  Cc: borntraeger, svens, alibuda, tonylu, guwen, linux-kernel,
	linux-s390, netdev

Since loopback-ism is not a PCI device, the PCI information fed back by
smc_nl_handle_smcd_dev() does not apply to loopback-ism. So currently
ignore loopback-ism when dumping SMC-D devices. The netlink function of
loopback-ism will be refactored when SMC netlink interface is updated.

Link: https://lore.kernel.org/r/caab067b-f5c3-490f-9259-262624c236b4@linux.ibm.com/
Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 net/smc/smc_ism.c | 2 ++
 net/smc/smc_ism.h | 5 +++++
 2 files changed, 7 insertions(+)

diff --git a/net/smc/smc_ism.c b/net/smc/smc_ism.c
index 36459a3c739d..188fd28423c2 100644
--- a/net/smc/smc_ism.c
+++ b/net/smc/smc_ism.c
@@ -322,6 +322,8 @@ static void smc_nl_prep_smcd_dev(struct smcd_dev_list *dev_list,
 	list_for_each_entry(smcd, &dev_list->list, list) {
 		if (num < snum)
 			goto next;
+		if (smc_ism_is_loopback(smcd))
+			goto next;
 		if (smc_nl_handle_smcd_dev(smcd, skb, cb))
 			goto errout;
 next:
diff --git a/net/smc/smc_ism.h b/net/smc/smc_ism.h
index 165cd013404b..322973527c61 100644
--- a/net/smc/smc_ism.h
+++ b/net/smc/smc_ism.h
@@ -84,4 +84,9 @@ static inline bool smc_ism_is_emulated(struct smcd_dev *smcd)
 	return __smc_ism_is_emulated(chid);
 }
 
+static inline bool smc_ism_is_loopback(struct smcd_dev *smcd)
+{
+	return (smcd->ops->get_chid(smcd) == 0xFFFF);
+}
+
 #endif
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v6 07/11] net/smc: register loopback-ism into SMC-D device list
  2024-04-14  4:02 [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism Wen Gu
                   ` (5 preceding siblings ...)
  2024-04-14  4:02 ` [PATCH net-next v6 06/11] net/smc: ignore loopback-ism when dumping SMC-D devices Wen Gu
@ 2024-04-14  4:03 ` Wen Gu
  2024-04-25 11:29   ` Wenjia Zhang
  2024-04-14  4:03 ` [PATCH net-next v6 08/11] net/smc: add operations to merge sndbuf with peer DMB Wen Gu
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Wen Gu @ 2024-04-14  4:03 UTC (permalink / raw
  To: wintera, twinkler, hca, gor, agordeev, davem, edumazet, kuba,
	pabeni, wenjia, jaka
  Cc: borntraeger, svens, alibuda, tonylu, guwen, linux-kernel,
	linux-s390, netdev

After the loopback-ism device is ready, add it to the SMC-D device list
as an ISMv2 device, and always keep it at the beginning to ensure it is
preferred for providing a shortcut for data transfer within the same
kernel.

Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 net/smc/smc_ism.c      | 30 +++++++++++++++++++++---------
 net/smc/smc_ism.h      |  1 +
 net/smc/smc_loopback.c | 20 +++++++++++++-------
 3 files changed, 35 insertions(+), 16 deletions(-)

diff --git a/net/smc/smc_ism.c b/net/smc/smc_ism.c
index 188fd28423c2..6bed0a61b746 100644
--- a/net/smc/smc_ism.c
+++ b/net/smc/smc_ism.c
@@ -91,6 +91,11 @@ bool smc_ism_is_v2_capable(void)
 	return smc_ism_v2_capable;
 }
 
+void smc_ism_set_v2_capable(void)
+{
+	smc_ism_v2_capable = true;
+}
+
 /* Set a connection using this DMBE. */
 void smc_ism_set_conn(struct smc_connection *conn)
 {
@@ -439,7 +444,7 @@ static struct smcd_dev *smcd_alloc_dev(struct device *parent, const char *name,
 static void smcd_register_dev(struct ism_dev *ism)
 {
 	const struct smcd_ops *ops = ism_get_smcd_ops();
-	struct smcd_dev *smcd;
+	struct smcd_dev *smcd, *fentry;
 
 	if (!ops)
 		return;
@@ -454,16 +459,23 @@ static void smcd_register_dev(struct ism_dev *ism)
 	if (smc_pnetid_by_dev_port(&ism->pdev->dev, 0, smcd->pnetid))
 		smc_pnetid_by_table_smcd(smcd);
 
+	if (smcd->ops->supports_v2())
+		smc_ism_set_v2_capable();
 	mutex_lock(&smcd_dev_list.mutex);
-	if (list_empty(&smcd_dev_list.list)) {
-		if (smcd->ops->supports_v2())
-			smc_ism_v2_capable = true;
-	}
-	/* sort list: devices without pnetid before devices with pnetid */
-	if (smcd->pnetid[0])
+	/* sort list:
+	 * - devices without pnetid before devices with pnetid;
+	 * - loopback-ism always at the very beginning;
+	 */
+	if (!smcd->pnetid[0]) {
+		fentry = list_first_entry_or_null(&smcd_dev_list.list,
+						  struct smcd_dev, list);
+		if (fentry && smc_ism_is_loopback(fentry))
+			list_add(&smcd->list, &fentry->list);
+		else
+			list_add(&smcd->list, &smcd_dev_list.list);
+	} else {
 		list_add_tail(&smcd->list, &smcd_dev_list.list);
-	else
-		list_add(&smcd->list, &smcd_dev_list.list);
+	}
 	mutex_unlock(&smcd_dev_list.mutex);
 
 	pr_warn_ratelimited("smc: adding smcd device %s with pnetid %.16s%s\n",
diff --git a/net/smc/smc_ism.h b/net/smc/smc_ism.h
index 322973527c61..e6f57e5e1ef9 100644
--- a/net/smc/smc_ism.h
+++ b/net/smc/smc_ism.h
@@ -52,6 +52,7 @@ int smc_ism_signal_shutdown(struct smc_link_group *lgr);
 void smc_ism_get_system_eid(u8 **eid);
 u16 smc_ism_get_chid(struct smcd_dev *dev);
 bool smc_ism_is_v2_capable(void);
+void smc_ism_set_v2_capable(void);
 int smc_ism_init(void);
 void smc_ism_exit(void);
 int smcd_nl_get_device(struct sk_buff *skb, struct netlink_callback *cb);
diff --git a/net/smc/smc_loopback.c b/net/smc/smc_loopback.c
index d75852549b00..94a57f4ee3f9 100644
--- a/net/smc/smc_loopback.c
+++ b/net/smc/smc_loopback.c
@@ -246,10 +246,12 @@ static int smcd_lo_register_dev(struct smc_lo_dev *ldev)
 		return -ENOMEM;
 	ldev->smcd = smcd;
 	smcd->priv = ldev;
-
-	/* TODO:
-	 * register loopback-ism to smcd_dev list.
-	 */
+	smc_ism_set_v2_capable();
+	mutex_lock(&smcd_dev_list.mutex);
+	list_add(&smcd->list, &smcd_dev_list.list);
+	mutex_unlock(&smcd_dev_list.mutex);
+	pr_warn_ratelimited("smc: adding smcd device %s\n",
+			    dev_name(&ldev->dev));
 	return 0;
 }
 
@@ -257,9 +259,13 @@ static void smcd_lo_unregister_dev(struct smc_lo_dev *ldev)
 {
 	struct smcd_dev *smcd = ldev->smcd;
 
-	/* TODO:
-	 * unregister loopback-ism from smcd_dev list.
-	 */
+	pr_warn_ratelimited("smc: removing smcd device %s\n",
+			    dev_name(&ldev->dev));
+	smcd->going_away = 1;
+	smc_smcd_terminate_all(smcd);
+	mutex_lock(&smcd_dev_list.mutex);
+	list_del_init(&smcd->list);
+	mutex_unlock(&smcd_dev_list.mutex);
 	kfree(smcd->conn);
 	kfree(smcd);
 }
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v6 08/11] net/smc: add operations to merge sndbuf with peer DMB
  2024-04-14  4:02 [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism Wen Gu
                   ` (6 preceding siblings ...)
  2024-04-14  4:03 ` [PATCH net-next v6 07/11] net/smc: register loopback-ism into SMC-D device list Wen Gu
@ 2024-04-14  4:03 ` Wen Gu
  2024-04-14  4:03 ` [PATCH net-next v6 09/11] net/smc: {at|de}tach sndbuf to peer DMB if supported Wen Gu
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Wen Gu @ 2024-04-14  4:03 UTC (permalink / raw
  To: wintera, twinkler, hca, gor, agordeev, davem, edumazet, kuba,
	pabeni, wenjia, jaka
  Cc: borntraeger, svens, alibuda, tonylu, guwen, linux-kernel,
	linux-s390, netdev

In some scenarios using Emulated-ISM device, sndbuf can share the same
physical memory region with peer DMB to avoid data copy from one side
to the other. In such case the sndbuf is only a descriptor that
describes the shared memory and does not actually occupy memory, it's
more like a ghost buffer.

      +----------+                     +----------+
      | socket A |                     | socket B |
      +----------+                     +----------+
            |                               |
       +--------+                       +--------+
       | sndbuf |                       |  DMB   |
       |  desc  |                       |  desc  |
       +--------+                       +--------+
            |                               |
            |                          +----v-----+
            +-------------------------->  memory  |
                                       +----------+

So here introduces three new SMC-D device operations to check if this
feature is supported by device, and to {attach|detach} ghost sndbuf to
peer DMB. For now only loopback-ism supports this.

Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 include/net/smc.h |  3 +++
 net/smc/smc_ism.c | 40 ++++++++++++++++++++++++++++++++++++++++
 net/smc/smc_ism.h |  4 ++++
 3 files changed, 47 insertions(+)

diff --git a/include/net/smc.h b/include/net/smc.h
index 33b753115e43..db84e4e35080 100644
--- a/include/net/smc.h
+++ b/include/net/smc.h
@@ -74,6 +74,9 @@ struct smcd_ops {
 	int (*reset_vlan_required)(struct smcd_dev *dev);
 	int (*signal_event)(struct smcd_dev *dev, struct smcd_gid *rgid,
 			    u32 trigger_irq, u32 event_code, u64 info);
+	int (*support_dmb_nocopy)(struct smcd_dev *dev);
+	int (*attach_dmb)(struct smcd_dev *dev, struct smcd_dmb *dmb);
+	int (*detach_dmb)(struct smcd_dev *dev, u64 token);
 };
 
 struct smcd_dev {
diff --git a/net/smc/smc_ism.c b/net/smc/smc_ism.c
index 6bed0a61b746..84f98e18c7db 100644
--- a/net/smc/smc_ism.c
+++ b/net/smc/smc_ism.c
@@ -250,6 +250,46 @@ int smc_ism_register_dmb(struct smc_link_group *lgr, int dmb_len,
 	return rc;
 }
 
+bool smc_ism_support_dmb_nocopy(struct smcd_dev *smcd)
+{
+	/* for now only loopback-ism supports
+	 * merging sndbuf with peer DMB to avoid
+	 * data copies between them.
+	 */
+	return (smcd->ops->support_dmb_nocopy &&
+		smcd->ops->support_dmb_nocopy(smcd));
+}
+
+int smc_ism_attach_dmb(struct smcd_dev *dev, u64 token,
+		       struct smc_buf_desc *dmb_desc)
+{
+	struct smcd_dmb dmb;
+	int rc = 0;
+
+	if (!dev->ops->attach_dmb)
+		return -EINVAL;
+
+	memset(&dmb, 0, sizeof(dmb));
+	dmb.dmb_tok = token;
+	rc = dev->ops->attach_dmb(dev, &dmb);
+	if (!rc) {
+		dmb_desc->sba_idx = dmb.sba_idx;
+		dmb_desc->token = dmb.dmb_tok;
+		dmb_desc->cpu_addr = dmb.cpu_addr;
+		dmb_desc->dma_addr = dmb.dma_addr;
+		dmb_desc->len = dmb.dmb_len;
+	}
+	return rc;
+}
+
+int smc_ism_detach_dmb(struct smcd_dev *dev, u64 token)
+{
+	if (!dev->ops->detach_dmb)
+		return -EINVAL;
+
+	return dev->ops->detach_dmb(dev, token);
+}
+
 static int smc_nl_handle_smcd_dev(struct smcd_dev *smcd,
 				  struct sk_buff *skb,
 				  struct netlink_callback *cb)
diff --git a/net/smc/smc_ism.h b/net/smc/smc_ism.h
index e6f57e5e1ef9..6763133dd8d0 100644
--- a/net/smc/smc_ism.h
+++ b/net/smc/smc_ism.h
@@ -48,6 +48,10 @@ int smc_ism_put_vlan(struct smcd_dev *dev, unsigned short vlan_id);
 int smc_ism_register_dmb(struct smc_link_group *lgr, int buf_size,
 			 struct smc_buf_desc *dmb_desc);
 int smc_ism_unregister_dmb(struct smcd_dev *dev, struct smc_buf_desc *dmb_desc);
+bool smc_ism_support_dmb_nocopy(struct smcd_dev *smcd);
+int smc_ism_attach_dmb(struct smcd_dev *dev, u64 token,
+		       struct smc_buf_desc *dmb_desc);
+int smc_ism_detach_dmb(struct smcd_dev *dev, u64 token);
 int smc_ism_signal_shutdown(struct smc_link_group *lgr);
 void smc_ism_get_system_eid(u8 **eid);
 u16 smc_ism_get_chid(struct smcd_dev *dev);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v6 09/11] net/smc: {at|de}tach sndbuf to peer DMB if supported
  2024-04-14  4:02 [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism Wen Gu
                   ` (7 preceding siblings ...)
  2024-04-14  4:03 ` [PATCH net-next v6 08/11] net/smc: add operations to merge sndbuf with peer DMB Wen Gu
@ 2024-04-14  4:03 ` Wen Gu
  2024-04-14  4:03 ` [PATCH net-next v6 10/11] net/smc: adapt cursor update when sndbuf and peer DMB are merged Wen Gu
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Wen Gu @ 2024-04-14  4:03 UTC (permalink / raw
  To: wintera, twinkler, hca, gor, agordeev, davem, edumazet, kuba,
	pabeni, wenjia, jaka
  Cc: borntraeger, svens, alibuda, tonylu, guwen, linux-kernel,
	linux-s390, netdev

If the device used by SMC-D supports merging local sndbuf to peer DMB,
then create sndbuf descriptor and attach it to peer DMB once peer
token is obtained, and detach and free the sndbuf descriptor when the
connection is freed.

Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 net/smc/af_smc.c   | 16 ++++++++++++
 net/smc/smc_core.c | 61 +++++++++++++++++++++++++++++++++++++++++++++-
 net/smc/smc_core.h |  1 +
 3 files changed, 77 insertions(+), 1 deletion(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 47f3bc1470bc..9389f0cfa374 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1436,6 +1436,14 @@ static int smc_connect_ism(struct smc_sock *smc,
 	}
 
 	smc_conn_save_peer_info(smc, aclc);
+
+	if (smc_ism_support_dmb_nocopy(smc->conn.lgr->smcd)) {
+		rc = smcd_buf_attach(smc);
+		if (rc) {
+			rc = SMC_CLC_DECL_MEM;	/* try to fallback */
+			goto connect_abort;
+		}
+	}
 	smc_close_init(smc);
 	smc_rx_init(smc);
 	smc_tx_init(smc);
@@ -2540,6 +2548,14 @@ static void smc_listen_work(struct work_struct *work)
 		mutex_unlock(&smc_server_lgr_pending);
 	}
 	smc_conn_save_peer_info(new_smc, cclc);
+
+	if (ini->is_smcd &&
+	    smc_ism_support_dmb_nocopy(new_smc->conn.lgr->smcd)) {
+		rc = smcd_buf_attach(new_smc);
+		if (rc)
+			goto out_decl;
+	}
+
 	smc_listen_out_connected(new_smc);
 	SMC_STAT_SERV_SUCC_INC(sock_net(newclcsock->sk), ini);
 	goto out_free;
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index 9b84d5897aa5..fafdb97adfad 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -1149,6 +1149,20 @@ static void smcr_buf_unuse(struct smc_buf_desc *buf_desc, bool is_rmb,
 	}
 }
 
+static void smcd_buf_detach(struct smc_connection *conn)
+{
+	struct smcd_dev *smcd = conn->lgr->smcd;
+	u64 peer_token = conn->peer_token;
+
+	if (!conn->sndbuf_desc)
+		return;
+
+	smc_ism_detach_dmb(smcd, peer_token);
+
+	kfree(conn->sndbuf_desc);
+	conn->sndbuf_desc = NULL;
+}
+
 static void smc_buf_unuse(struct smc_connection *conn,
 			  struct smc_link_group *lgr)
 {
@@ -1192,6 +1206,8 @@ void smc_conn_free(struct smc_connection *conn)
 	if (lgr->is_smcd) {
 		if (!list_empty(&lgr->list))
 			smc_ism_unset_conn(conn);
+		if (smc_ism_support_dmb_nocopy(lgr->smcd))
+			smcd_buf_detach(conn);
 		tasklet_kill(&conn->rx_tsklet);
 	} else {
 		smc_cdc_wait_pend_tx_wr(conn);
@@ -1445,6 +1461,8 @@ static void smc_conn_kill(struct smc_connection *conn, bool soft)
 	smc_sk_wake_ups(smc);
 	if (conn->lgr->is_smcd) {
 		smc_ism_unset_conn(conn);
+		if (smc_ism_support_dmb_nocopy(conn->lgr->smcd))
+			smcd_buf_detach(conn);
 		if (soft)
 			tasklet_kill(&conn->rx_tsklet);
 		else
@@ -2464,12 +2482,18 @@ int smc_buf_create(struct smc_sock *smc, bool is_smcd)
 	int rc;
 
 	/* create send buffer */
+	if (is_smcd &&
+	    smc_ism_support_dmb_nocopy(smc->conn.lgr->smcd))
+		goto create_rmb;
+
 	rc = __smc_buf_create(smc, is_smcd, false);
 	if (rc)
 		return rc;
+
+create_rmb:
 	/* create rmb */
 	rc = __smc_buf_create(smc, is_smcd, true);
-	if (rc) {
+	if (rc && smc->conn.sndbuf_desc) {
 		down_write(&smc->conn.lgr->sndbufs_lock);
 		list_del(&smc->conn.sndbuf_desc->list);
 		up_write(&smc->conn.lgr->sndbufs_lock);
@@ -2479,6 +2503,41 @@ int smc_buf_create(struct smc_sock *smc, bool is_smcd)
 	return rc;
 }
 
+int smcd_buf_attach(struct smc_sock *smc)
+{
+	struct smc_connection *conn = &smc->conn;
+	struct smcd_dev *smcd = conn->lgr->smcd;
+	u64 peer_token = conn->peer_token;
+	struct smc_buf_desc *buf_desc;
+	int rc;
+
+	buf_desc = kzalloc(sizeof(*buf_desc), GFP_KERNEL);
+	if (!buf_desc)
+		return -ENOMEM;
+
+	/* The ghost sndbuf_desc describes the same memory region as
+	 * peer RMB. Its lifecycle is consistent with the connection's
+	 * and it will be freed with the connections instead of the
+	 * link group.
+	 */
+	rc = smc_ism_attach_dmb(smcd, peer_token, buf_desc);
+	if (rc)
+		goto free;
+
+	smc->sk.sk_sndbuf = buf_desc->len;
+	buf_desc->cpu_addr =
+		(u8 *)buf_desc->cpu_addr + sizeof(struct smcd_cdc_msg);
+	buf_desc->len -= sizeof(struct smcd_cdc_msg);
+	conn->sndbuf_desc = buf_desc;
+	conn->sndbuf_desc->used = 1;
+	atomic_set(&conn->sndbuf_space, conn->sndbuf_desc->len);
+	return 0;
+
+free:
+	kfree(buf_desc);
+	return rc;
+}
+
 static inline int smc_rmb_reserve_rtoken_idx(struct smc_link_group *lgr)
 {
 	int i;
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index 1f175376037b..d93cf51dbd7c 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -557,6 +557,7 @@ void smc_smcd_terminate(struct smcd_dev *dev, struct smcd_gid *peer_gid,
 void smc_smcd_terminate_all(struct smcd_dev *dev);
 void smc_smcr_terminate_all(struct smc_ib_device *smcibdev);
 int smc_buf_create(struct smc_sock *smc, bool is_smcd);
+int smcd_buf_attach(struct smc_sock *smc);
 int smc_uncompress_bufsize(u8 compressed);
 int smc_rmb_rtoken_handling(struct smc_connection *conn, struct smc_link *link,
 			    struct smc_clc_msg_accept_confirm *clc);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v6 10/11] net/smc: adapt cursor update when sndbuf and peer DMB are merged
  2024-04-14  4:02 [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism Wen Gu
                   ` (8 preceding siblings ...)
  2024-04-14  4:03 ` [PATCH net-next v6 09/11] net/smc: {at|de}tach sndbuf to peer DMB if supported Wen Gu
@ 2024-04-14  4:03 ` Wen Gu
  2024-04-16 11:05   ` Simon Horman
  2024-04-14  4:03 ` [PATCH net-next v6 11/11] net/smc: implement DMB-merged operations of loopback-ism Wen Gu
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 27+ messages in thread
From: Wen Gu @ 2024-04-14  4:03 UTC (permalink / raw
  To: wintera, twinkler, hca, gor, agordeev, davem, edumazet, kuba,
	pabeni, wenjia, jaka
  Cc: borntraeger, svens, alibuda, tonylu, guwen, linux-kernel,
	linux-s390, netdev

If the local sndbuf shares the same physical memory with peer DMB,
the cursor update processing needs to be adapted to ensure that the
data to be consumed won't be overwritten.

So in this case, the fin_curs and sndbuf_space that were originally
updated after sending the CDC message should be modified to not be
update until the peer updates cons_curs.

Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 net/smc/smc_cdc.c | 34 +++++++++++++++++++++++++++++++++-
 1 file changed, 33 insertions(+), 1 deletion(-)

diff --git a/net/smc/smc_cdc.c b/net/smc/smc_cdc.c
index 3c06625ceb20..5e95347ae497 100644
--- a/net/smc/smc_cdc.c
+++ b/net/smc/smc_cdc.c
@@ -18,6 +18,7 @@
 #include "smc_tx.h"
 #include "smc_rx.h"
 #include "smc_close.h"
+#include "smc_ism.h"
 
 /********************************** send *************************************/
 
@@ -255,6 +256,14 @@ int smcd_cdc_msg_send(struct smc_connection *conn)
 		return rc;
 	smc_curs_copy(&conn->rx_curs_confirmed, &curs, conn);
 	conn->local_rx_ctrl.prod_flags.cons_curs_upd_req = 0;
+
+	if (smc_ism_support_dmb_nocopy(conn->lgr->smcd))
+		/* if local sndbuf shares the same memory region with
+		 * peer DMB, then don't update the tx_curs_fin
+		 * and sndbuf_space until peer has consumed the data.
+		 */
+		return rc;
+
 	/* Calculate transmitted data and increment free send buffer space */
 	diff = smc_curs_diff(conn->sndbuf_desc->len, &conn->tx_curs_fin,
 			     &conn->tx_curs_sent);
@@ -323,7 +332,7 @@ static void smc_cdc_msg_recv_action(struct smc_sock *smc,
 {
 	union smc_host_cursor cons_old, prod_old;
 	struct smc_connection *conn = &smc->conn;
-	int diff_cons, diff_prod;
+	int diff_cons, diff_prod, diff_tx;
 
 	smc_curs_copy(&prod_old, &conn->local_rx_ctrl.prod, conn);
 	smc_curs_copy(&cons_old, &conn->local_rx_ctrl.cons, conn);
@@ -339,6 +348,29 @@ static void smc_cdc_msg_recv_action(struct smc_sock *smc,
 		atomic_add(diff_cons, &conn->peer_rmbe_space);
 		/* guarantee 0 <= peer_rmbe_space <= peer_rmbe_size */
 		smp_mb__after_atomic();
+
+		/* if local sndbuf shares the same memory region with
+		 * peer RMB, then update tx_curs_fin and sndbuf_space
+		 * here since peer has already consumed the data.
+		 */
+		if (conn->lgr->is_smcd &&
+		    smc_ism_support_dmb_nocopy(conn->lgr->smcd)) {
+			/* Calculate consumed data and
+			 * increment free send buffer space.
+			 */
+			diff_tx = smc_curs_diff(conn->sndbuf_desc->len,
+						&conn->tx_curs_fin,
+						&conn->local_rx_ctrl.cons);
+			/* increase local sndbuf space and fin_curs */
+			smp_mb__before_atomic();
+			atomic_add(diff_tx, &conn->sndbuf_space);
+			/* guarantee 0 <= sndbuf_space <= sndbuf_desc->len */
+			smp_mb__after_atomic();
+			smc_curs_copy(&conn->tx_curs_fin,
+				      &conn->local_rx_ctrl.cons, conn);
+
+			smc_tx_sndbuf_nonfull(smc);
+		}
 	}
 
 	diff_prod = smc_curs_diff(conn->rmb_desc->len, &prod_old,
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH net-next v6 11/11] net/smc: implement DMB-merged operations of loopback-ism
  2024-04-14  4:02 [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism Wen Gu
                   ` (9 preceding siblings ...)
  2024-04-14  4:03 ` [PATCH net-next v6 10/11] net/smc: adapt cursor update when sndbuf and peer DMB are merged Wen Gu
@ 2024-04-14  4:03 ` Wen Gu
  2024-04-17 15:36 ` [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism Wenjia Zhang
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 27+ messages in thread
From: Wen Gu @ 2024-04-14  4:03 UTC (permalink / raw
  To: wintera, twinkler, hca, gor, agordeev, davem, edumazet, kuba,
	pabeni, wenjia, jaka
  Cc: borntraeger, svens, alibuda, tonylu, guwen, linux-kernel,
	linux-s390, netdev

This implements operations related to merging sndbuf with peer DMB in
loopback-ism. The DMB won't be freed until no sndbuf is attached to it.

Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
---
 net/smc/smc_loopback.c | 120 +++++++++++++++++++++++++++++++++++------
 net/smc/smc_loopback.h |   3 ++
 2 files changed, 108 insertions(+), 15 deletions(-)

diff --git a/net/smc/smc_loopback.c b/net/smc/smc_loopback.c
index 94a57f4ee3f9..3c5f64ca4115 100644
--- a/net/smc/smc_loopback.c
+++ b/net/smc/smc_loopback.c
@@ -20,6 +20,7 @@
 #include "smc_loopback.h"
 
 #define SMC_LO_V2_CAPABLE	0x1 /* loopback-ism acts as ISMv2 */
+#define SMC_LO_SUPPORT_NOCOPY	0x1
 #define SMC_DMA_ADDR_INVALID	(~(dma_addr_t)0)
 
 static const char smc_lo_dev_name[] = "loopback-ism";
@@ -81,6 +82,7 @@ static int smc_lo_register_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb,
 		goto err_node;
 	}
 	dmb_node->dma_addr = SMC_DMA_ADDR_INVALID;
+	refcount_set(&dmb_node->refcnt, 1);
 
 again:
 	/* add new dmb into hash table */
@@ -94,6 +96,7 @@ static int smc_lo_register_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb,
 	}
 	hash_add(ldev->dmb_ht, &dmb_node->list, dmb_node->token);
 	write_unlock_bh(&ldev->dmb_ht_lock);
+	atomic_inc(&ldev->dmb_cnt);
 
 	dmb->sba_idx = dmb_node->sba_idx;
 	dmb->dmb_tok = dmb_node->token;
@@ -110,13 +113,29 @@ static int smc_lo_register_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb,
 	return rc;
 }
 
+static void __smc_lo_unregister_dmb(struct smc_lo_dev *ldev,
+				    struct smc_lo_dmb_node *dmb_node)
+{
+	/* remove dmb from hash table */
+	write_lock_bh(&ldev->dmb_ht_lock);
+	hash_del(&dmb_node->list);
+	write_unlock_bh(&ldev->dmb_ht_lock);
+
+	clear_bit(dmb_node->sba_idx, ldev->sba_idx_mask);
+	kvfree(dmb_node->cpu_addr);
+	kfree(dmb_node);
+
+	if (atomic_dec_and_test(&ldev->dmb_cnt))
+		wake_up(&ldev->ldev_release);
+}
+
 static int smc_lo_unregister_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb)
 {
 	struct smc_lo_dmb_node *dmb_node = NULL, *tmp_node;
 	struct smc_lo_dev *ldev = smcd->priv;
 
-	/* remove dmb from hash table */
-	write_lock_bh(&ldev->dmb_ht_lock);
+	/* find dmb from hash table */
+	read_lock_bh(&ldev->dmb_ht_lock);
 	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb->dmb_tok) {
 		if (tmp_node->token == dmb->dmb_tok) {
 			dmb_node = tmp_node;
@@ -124,16 +143,76 @@ static int smc_lo_unregister_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb)
 		}
 	}
 	if (!dmb_node) {
-		write_unlock_bh(&ldev->dmb_ht_lock);
+		read_unlock_bh(&ldev->dmb_ht_lock);
 		return -EINVAL;
 	}
-	hash_del(&dmb_node->list);
-	write_unlock_bh(&ldev->dmb_ht_lock);
+	read_unlock_bh(&ldev->dmb_ht_lock);
 
-	clear_bit(dmb_node->sba_idx, ldev->sba_idx_mask);
-	kfree(dmb_node->cpu_addr);
-	kfree(dmb_node);
+	if (refcount_dec_and_test(&dmb_node->refcnt))
+		__smc_lo_unregister_dmb(ldev, dmb_node);
+	return 0;
+}
+
+static int smc_lo_support_dmb_nocopy(struct smcd_dev *smcd)
+{
+	return SMC_LO_SUPPORT_NOCOPY;
+}
+
+static int smc_lo_attach_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb)
+{
+	struct smc_lo_dmb_node *dmb_node = NULL, *tmp_node;
+	struct smc_lo_dev *ldev = smcd->priv;
+
+	/* find dmb_node according to dmb->dmb_tok */
+	read_lock_bh(&ldev->dmb_ht_lock);
+	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb->dmb_tok) {
+		if (tmp_node->token == dmb->dmb_tok) {
+			dmb_node = tmp_node;
+			break;
+		}
+	}
+	if (!dmb_node) {
+		read_unlock_bh(&ldev->dmb_ht_lock);
+		return -EINVAL;
+	}
+	read_unlock_bh(&ldev->dmb_ht_lock);
+
+	if (!refcount_inc_not_zero(&dmb_node->refcnt))
+		/* the dmb is being unregistered, but has
+		 * not been removed from the hash table.
+		 */
+		return -EINVAL;
 
+	/* provide dmb information */
+	dmb->sba_idx = dmb_node->sba_idx;
+	dmb->dmb_tok = dmb_node->token;
+	dmb->cpu_addr = dmb_node->cpu_addr;
+	dmb->dma_addr = dmb_node->dma_addr;
+	dmb->dmb_len = dmb_node->len;
+	return 0;
+}
+
+static int smc_lo_detach_dmb(struct smcd_dev *smcd, u64 token)
+{
+	struct smc_lo_dmb_node *dmb_node = NULL, *tmp_node;
+	struct smc_lo_dev *ldev = smcd->priv;
+
+	/* find dmb_node according to dmb->dmb_tok */
+	read_lock_bh(&ldev->dmb_ht_lock);
+	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, token) {
+		if (tmp_node->token == token) {
+			dmb_node = tmp_node;
+			break;
+		}
+	}
+	if (!dmb_node) {
+		read_unlock_bh(&ldev->dmb_ht_lock);
+		return -EINVAL;
+	}
+	read_unlock_bh(&ldev->dmb_ht_lock);
+
+	if (refcount_dec_and_test(&dmb_node->refcnt))
+		__smc_lo_unregister_dmb(ldev, dmb_node);
 	return 0;
 }
 
@@ -145,6 +224,12 @@ static int smc_lo_move_data(struct smcd_dev *smcd, u64 dmb_tok,
 	struct smc_lo_dev *ldev = smcd->priv;
 	struct smc_connection *conn;
 
+	if (!sf)
+		/* since sndbuf is merged with peer DMB, there is
+		 * no need to copy data from sndbuf to peer DMB.
+		 */
+		return 0;
+
 	read_lock_bh(&ldev->dmb_ht_lock);
 	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb_tok) {
 		if (tmp_node->token == dmb_tok) {
@@ -159,13 +244,10 @@ static int smc_lo_move_data(struct smcd_dev *smcd, u64 dmb_tok,
 	memcpy((char *)rmb_node->cpu_addr + offset, data, size);
 	read_unlock_bh(&ldev->dmb_ht_lock);
 
-	if (sf) {
-		conn = smcd->conn[rmb_node->sba_idx];
-		if (conn && !conn->killed)
-			tasklet_schedule(&conn->rx_tsklet);
-		else
-			return -EPIPE;
-	}
+	conn = smcd->conn[rmb_node->sba_idx];
+	if (!conn || conn->killed)
+		return -EPIPE;
+	tasklet_schedule(&conn->rx_tsklet);
 	return 0;
 }
 
@@ -197,6 +279,9 @@ static const struct smcd_ops lo_ops = {
 	.query_remote_gid = smc_lo_query_rgid,
 	.register_dmb = smc_lo_register_dmb,
 	.unregister_dmb = smc_lo_unregister_dmb,
+	.support_dmb_nocopy = smc_lo_support_dmb_nocopy,
+	.attach_dmb = smc_lo_attach_dmb,
+	.detach_dmb = smc_lo_detach_dmb,
 	.add_vlan_id		= NULL,
 	.del_vlan_id		= NULL,
 	.set_vlan_required	= NULL,
@@ -275,12 +360,17 @@ static int smc_lo_dev_init(struct smc_lo_dev *ldev)
 	smc_lo_generate_ids(ldev);
 	rwlock_init(&ldev->dmb_ht_lock);
 	hash_init(ldev->dmb_ht);
+	atomic_set(&ldev->dmb_cnt, 0);
+	init_waitqueue_head(&ldev->ldev_release);
+
 	return smcd_lo_register_dev(ldev);
 }
 
 static void smc_lo_dev_exit(struct smc_lo_dev *ldev)
 {
 	smcd_lo_unregister_dev(ldev);
+	if (atomic_read(&ldev->dmb_cnt))
+		wait_event(ldev->ldev_release, !atomic_read(&ldev->dmb_cnt));
 }
 
 static void smc_lo_dev_release(struct device *dev)
diff --git a/net/smc/smc_loopback.h b/net/smc/smc_loopback.h
index 4b36acbc27da..387407b8b385 100644
--- a/net/smc/smc_loopback.h
+++ b/net/smc/smc_loopback.h
@@ -30,6 +30,7 @@ struct smc_lo_dmb_node {
 	u32 sba_idx;
 	void *cpu_addr;
 	dma_addr_t dma_addr;
+	refcount_t refcnt;
 };
 
 struct smc_lo_dev {
@@ -37,9 +38,11 @@ struct smc_lo_dev {
 	struct device dev;
 	u16 chid;
 	struct smcd_gid local_gid;
+	atomic_t dmb_cnt;
 	rwlock_t dmb_ht_lock;
 	DECLARE_BITMAP(sba_idx_mask, SMC_LO_MAX_DMBS);
 	DECLARE_HASHTABLE(dmb_ht, SMC_LO_DMBS_HASH_BITS);
+	wait_queue_head_t ldev_release;
 };
 
 int smc_loopback_init(void);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v6 01/11] net/smc: decouple ism_client from SMC-D DMB registration
  2024-04-14  4:02 ` [PATCH net-next v6 01/11] net/smc: decouple ism_client from SMC-D DMB registration Wen Gu
@ 2024-04-15  8:41   ` Alexandra Winter
  2024-04-16 13:58     ` Wen Gu
  2024-04-25 11:29     ` Wenjia Zhang
  0 siblings, 2 replies; 27+ messages in thread
From: Alexandra Winter @ 2024-04-15  8:41 UTC (permalink / raw
  To: Wen Gu, twinkler, hca, gor, agordeev, davem, edumazet, kuba,
	pabeni, wenjia, jaka
  Cc: borntraeger, svens, alibuda, tonylu, linux-kernel, linux-s390,
	netdev



On 14.04.24 06:02, Wen Gu wrote:
> The struct 'ism_client' is specialized for s390 platform firmware ISM.
> So replace it with 'void' to make SMCD DMB registration helper generic
> for both Emulated-ISM and existing ISM.
> 
> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
> ---

Just a thought:
The client concept is really specific to s390 platform firmware ISM.
So wouldn't it be nice to do something like:

diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
index 78cca4839a31..37dcdf2bc044 100644
--- a/drivers/s390/net/ism_drv.c
+++ b/drivers/s390/net/ism_drv.c
@@ -747,10 +747,9 @@ static int smcd_query_rgid(struct smcd_dev *smcd, struct smcd_gid *rgid,
        return ism_query_rgid(smcd->priv, rgid->gid, vid_valid, vid);
 }

-static int smcd_register_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb,
-                            struct ism_client *client)
+static int smcd_register_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb)
 {
-       return ism_register_dmb(smcd->priv, (struct ism_dmb *)dmb, client);
+       return ism_register_dmb(smcd->priv, (struct ism_dmb *)dmb, &smc_ism_client);
 }

 static int smcd_unregister_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb)

--------------

This is not a real patch, just a sketch, but I hope you
get the idea.


This may be a step in the direction of moving the ism_client concept from 
net/smc/smc_ism.c to drivers/s390/net/ism*


I know that there are several dependencies to consider. 
And I haven't looked at the other patches in this series yet in detail, to see how you solve
things like smcd_register_dev. Seems like smcd_register_dmb() is the only one of the smcd_ops
that you need for loopback and uses ism_client.



Wenjia, Gerd, and others what do you think?

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v6 10/11] net/smc: adapt cursor update when sndbuf and peer DMB are merged
  2024-04-14  4:03 ` [PATCH net-next v6 10/11] net/smc: adapt cursor update when sndbuf and peer DMB are merged Wen Gu
@ 2024-04-16 11:05   ` Simon Horman
  2024-04-16 14:06     ` Wen Gu
  0 siblings, 1 reply; 27+ messages in thread
From: Simon Horman @ 2024-04-16 11:05 UTC (permalink / raw
  To: Wen Gu
  Cc: wintera, twinkler, hca, gor, agordeev, davem, edumazet, kuba,
	pabeni, wenjia, jaka, borntraeger, svens, alibuda, tonylu,
	linux-kernel, linux-s390, netdev

On Sun, Apr 14, 2024 at 12:03:03PM +0800, Wen Gu wrote:
> If the local sndbuf shares the same physical memory with peer DMB,
> the cursor update processing needs to be adapted to ensure that the
> data to be consumed won't be overwritten.
> 
> So in this case, the fin_curs and sndbuf_space that were originally
> updated after sending the CDC message should be modified to not be
> update until the peer updates cons_curs.
> 
> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>

...

> @@ -255,6 +256,14 @@ int smcd_cdc_msg_send(struct smc_connection *conn)
>  		return rc;
>  	smc_curs_copy(&conn->rx_curs_confirmed, &curs, conn);
>  	conn->local_rx_ctrl.prod_flags.cons_curs_upd_req = 0;
> +
> +	if (smc_ism_support_dmb_nocopy(conn->lgr->smcd))
> +		/* if local sndbuf shares the same memory region with
> +		 * peer DMB, then don't update the tx_curs_fin
> +		 * and sndbuf_space until peer has consumed the data.
> +		 */
> +		return rc;

Hi Wen Gu,

A minor nit from my side:

To my mind "return rc" implies returning an error value.
But here rc is 0, which based on the comment seems correct.
So perhaps it would be clearer to simply return 0.

Flagged by Smatch.

> +
>  	/* Calculate transmitted data and increment free send buffer space */
>  	diff = smc_curs_diff(conn->sndbuf_desc->len, &conn->tx_curs_fin,
>  			     &conn->tx_curs_sent);

...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v6 01/11] net/smc: decouple ism_client from SMC-D DMB registration
  2024-04-15  8:41   ` Alexandra Winter
@ 2024-04-16 13:58     ` Wen Gu
  2024-04-25 11:29     ` Wenjia Zhang
  1 sibling, 0 replies; 27+ messages in thread
From: Wen Gu @ 2024-04-16 13:58 UTC (permalink / raw
  To: Alexandra Winter, twinkler, hca, gor, agordeev, davem, edumazet,
	kuba, pabeni, wenjia, jaka
  Cc: borntraeger, svens, alibuda, tonylu, linux-kernel, linux-s390,
	netdev



On 2024/4/15 16:41, Alexandra Winter wrote:
> 
> 
> On 14.04.24 06:02, Wen Gu wrote:
>> The struct 'ism_client' is specialized for s390 platform firmware ISM.
>> So replace it with 'void' to make SMCD DMB registration helper generic
>> for both Emulated-ISM and existing ISM.
>>
>> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
>> ---
> 
> Just a thought:
> The client concept is really specific to s390 platform firmware ISM.
> So wouldn't it be nice to do something like:
> 
> diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
> index 78cca4839a31..37dcdf2bc044 100644
> --- a/drivers/s390/net/ism_drv.c
> +++ b/drivers/s390/net/ism_drv.c
> @@ -747,10 +747,9 @@ static int smcd_query_rgid(struct smcd_dev *smcd, struct smcd_gid *rgid,
>          return ism_query_rgid(smcd->priv, rgid->gid, vid_valid, vid);
>   }
> 
> -static int smcd_register_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb,
> -                            struct ism_client *client)
> +static int smcd_register_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb)
>   {
> -       return ism_register_dmb(smcd->priv, (struct ism_dmb *)dmb, client);
> +       return ism_register_dmb(smcd->priv, (struct ism_dmb *)dmb, &smc_ism_client);
>   }
> 
>   static int smcd_unregister_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb)
> 
> --------------
> 
> This is not a real patch, just a sketch, but I hope you
> get the idea.
> 
> 
> This may be a step in the direction of moving the ism_client concept from
> net/smc/smc_ism.c to drivers/s390/net/ism*
> 
> 
> I know that there are several dependencies to consider.

Yeah.. I think so too. The move of ism_client concept may involve much work.

> And I haven't looked at the other patches in this series yet in detail, to see how you solve
> things like smcd_register_dev. Seems like smcd_register_dmb() is the only one of the smcd_ops
> that you need for loopback and uses ism_client.
> 

loopback-ism uses smcd_lo_register_dev instead. And yes, smcd_register_dmb() is
the only one of smcd_ops that use ism_client in its function argument.

> 
> 
> Wenjia, Gerd, and others what do you think?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v6 10/11] net/smc: adapt cursor update when sndbuf and peer DMB are merged
  2024-04-16 11:05   ` Simon Horman
@ 2024-04-16 14:06     ` Wen Gu
  0 siblings, 0 replies; 27+ messages in thread
From: Wen Gu @ 2024-04-16 14:06 UTC (permalink / raw
  To: Simon Horman
  Cc: wintera, twinkler, hca, gor, agordeev, davem, edumazet, kuba,
	pabeni, wenjia, jaka, borntraeger, svens, alibuda, tonylu,
	linux-kernel, linux-s390, netdev



On 2024/4/16 19:05, Simon Horman wrote:
> On Sun, Apr 14, 2024 at 12:03:03PM +0800, Wen Gu wrote:
>> If the local sndbuf shares the same physical memory with peer DMB,
>> the cursor update processing needs to be adapted to ensure that the
>> data to be consumed won't be overwritten.
>>
>> So in this case, the fin_curs and sndbuf_space that were originally
>> updated after sending the CDC message should be modified to not be
>> update until the peer updates cons_curs.
>>
>> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
> 
> ...
> 
>> @@ -255,6 +256,14 @@ int smcd_cdc_msg_send(struct smc_connection *conn)
>>   		return rc;
>>   	smc_curs_copy(&conn->rx_curs_confirmed, &curs, conn);
>>   	conn->local_rx_ctrl.prod_flags.cons_curs_upd_req = 0;
>> +
>> +	if (smc_ism_support_dmb_nocopy(conn->lgr->smcd))
>> +		/* if local sndbuf shares the same memory region with
>> +		 * peer DMB, then don't update the tx_curs_fin
>> +		 * and sndbuf_space until peer has consumed the data.
>> +		 */
>> +		return rc;
> 
> Hi Wen Gu,
> 
> A minor nit from my side:
> 
> To my mind "return rc" implies returning an error value.
> But here rc is 0, which based on the comment seems correct.
> So perhaps it would be clearer to simply return 0.
> 
> Flagged by Smatch.
> 
>> +
>>   	/* Calculate transmitted data and increment free send buffer space */
>>   	diff = smc_curs_diff(conn->sndbuf_desc->len, &conn->tx_curs_fin,
>>   			     &conn->tx_curs_sent);
> 
> ...

OK. I will improve it and another 'return rc' at the end of
smcd_cdc_msg_send(). Thanks! Simon.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v6 02/11] net/smc: introduce loopback-ism for SMC intra-OS shortcut
  2024-04-14  4:02 ` [PATCH net-next v6 02/11] net/smc: introduce loopback-ism for SMC intra-OS shortcut Wen Gu
@ 2024-04-17 10:20   ` Gerd Bayer
  0 siblings, 0 replies; 27+ messages in thread
From: Gerd Bayer @ 2024-04-17 10:20 UTC (permalink / raw
  To: Wen Gu, wintera, twinkler, hca, gor, agordeev, davem, edumazet,
	kuba, pabeni, wenjia, jaka
  Cc: borntraeger, svens, alibuda, tonylu, linux-kernel, linux-s390,
	netdev

On Sun, 2024-04-14 at 12:02 +0800, Wen Gu wrote:
> This introduces a kind of Emulated-ISM device named loopback-ism for
> SMCv2.1. The loopback-ism device is currently exclusive for SMC
> usage, and aims to provide an SMC shortcut for sockets within the
> same kernel, leading to improved intra-OS traffic performance.
> Configuration of this feature is managed through the config SMC_LO.
> 
> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
> ---
>  net/smc/Kconfig        |  13 ++++
>  net/smc/Makefile       |   1 +
>  net/smc/af_smc.c       |  12 +++-
>  net/smc/smc_loopback.c | 156
> +++++++++++++++++++++++++++++++++++++++++
>  net/smc/smc_loopback.h |  43 ++++++++++++
>  5 files changed, 224 insertions(+), 1 deletion(-)
>  create mode 100644 net/smc/smc_loopback.c
>  create mode 100644 net/smc/smc_loopback.h
> 

Thanks Wen Gu,

this looks good to me now. A W=1 compile-test of the whole series with
SMC_LO undefined showed that there were no additional unresolved
symbols introduced.

Feel free to add my
Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism
  2024-04-14  4:02 [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism Wen Gu
                   ` (10 preceding siblings ...)
  2024-04-14  4:03 ` [PATCH net-next v6 11/11] net/smc: implement DMB-merged operations of loopback-ism Wen Gu
@ 2024-04-17 15:36 ` Wenjia Zhang
  2024-04-23  6:45   ` Wen Gu
  2024-04-25 11:30 ` Wenjia Zhang
  2024-04-26  7:02 ` Jan Karcher
  13 siblings, 1 reply; 27+ messages in thread
From: Wenjia Zhang @ 2024-04-17 15:36 UTC (permalink / raw
  To: Wen Gu, wintera, twinkler, hca, gor, agordeev, davem, edumazet,
	kuba, pabeni, jaka
  Cc: borntraeger, svens, alibuda, tonylu, linux-kernel, linux-s390,
	netdev



On 14.04.24 06:02, Wen Gu wrote:
> This patch set acts as the second part of the new version of [1] (The first
> part can be referred from [2]), the updated things of this version are listed
> at the end.
> 
> - Background
> 
> SMC-D is now used in IBM z with ISM function to optimize network interconnect
> for intra-CPC communications. Inspired by this, we try to make SMC-D available
> on the non-s390 architecture through a software-implemented Emulated-ISM device,
> that is the loopback-ism device here, to accelerate inter-process or
> inter-containers communication within the same OS instance.
> 
> - Design
> 
> This patch set includes 3 parts:
> 
>   - Patch #1: some prepare work for loopback-ism.
>   - Patch #2-#7: implement loopback-ism device and adapt SMC-D for it.
>     loopback-ism now serves only SMC and no userspace interfaces exposed.
>   - Patch #8-#11: memory copy optimization for intra-OS scenario.
> 
> The loopback-ism device is designed as an ISMv2 device and not be limited to
> a specific net namespace, ends of both inter-process connection (1/1' in diagram
> below) or inter-container connection (2/2' in diagram below) can find the same
> available loopback-ism and choose it during the CLC handshake.
> 
>   Container 1 (ns1)                              Container 2 (ns2)
>   +-----------------------------------------+    +-------------------------+
>   | +-------+      +-------+      +-------+ |    |        +-------+        |
>   | | App A |      | App B |      | App C | |    |        | App D |<-+     |
>   | +-------+      +---^---+      +-------+ |    |        +-------+  |(2') |
>   |     |127.0.0.1 (1')|             |192.168.0.11       192.168.0.12|     |
>   |  (1)|   +--------+ | +--------+  |(2)   |    | +--------+   +--------+ |
>   |     `-->|   lo   |-` |  eth0  |<-`      |    | |   lo   |   |  eth0  | |
>   +---------+--|---^-+---+-----|--+---------+    +-+--------+---+-^------+-+
>                |   |           |                                  |
>   Kernel       |   |           |                                  |
>   +----+-------v---+-----------v----------------------------------+---+----+
>   |    |                            TCP                               |    |
>   |    |                                                              |    |
>   |    +--------------------------------------------------------------+    |
>   |                                                                        |
>   |                           +--------------+                             |
>   |                           | smc loopback |                             |
>   +---------------------------+--------------+-----------------------------+
> 
> loopback-ism device creates DMBs (shared memory) for each connection peer.
> Since data transfer occurs within the same kernel, the sndbuf of each peer
> is only a descriptor and point to the same memory region as peer DMB, so that
> the data copy from sndbuf to peer DMB can be avoided in loopback-ism case.
> 
>   Container 1 (ns1)                              Container 2 (ns2)
>   +-----------------------------------------+    +-------------------------+
>   | +-------+                               |    |        +-------+        |
>   | | App C |-----+                         |    |        | App D |        |
>   | +-------+     |                         |    |        +-^-----+        |
>   |               |                         |    |          |              |
>   |           (2) |                         |    |     (2') |              |
>   |               |                         |    |          |              |
>   +---------------|-------------------------+    +----------|--------------+
>                   |                                         |
>   Kernel          |                                         |
>   +---------------|-----------------------------------------|--------------+
>   | +--------+ +--v-----+                           +--------+ +--------+  |
>   | |dmb_desc| |snd_desc|                           |dmb_desc| |snd_desc|  |
>   | +-----|--+ +--|-----+                           +-----|--+ +--------+  |
>   | +-----|--+    |                                 +-----|--+             |
>   | | DMB C  |    +---------------------------------| DMB D  |             |
>   | +--------+                                      +--------+             |
>   |                                                                        |
>   |                           +--------------+                             |
>   |                           | smc loopback |                             |
>   +---------------------------+--------------+-----------------------------+
> 
> - Benchmark Test
> 
>   * Test environments:
>        - VM with Intel Xeon Platinum 8 core 2.50GHz, 16 GiB mem.
>        - SMC sndbuf/DMB size 1MB.
> 
>   * Test object:
>        - TCP: run on TCP loopback.
>        - SMC lo: run on SMC loopback-ism.
> 
> 1. ipc-benchmark (see [3])
> 
>   - ./<foo> -c 1000000 -s 100
> 
>                              TCP                  SMC-lo
> Message
> rate (msg/s)              79693                  148236(+86.01%)
> 
> 2. sockperf
> 
>   - serv: <smc_run> sockperf sr --tcp
>   - clnt: <smc_run> sockperf { tp | pp } --tcp --msg-size={ 64000 for tp | 14 for pp } -i 127.0.0.1 -t 30
> 
>                              TCP                  SMC-lo
> Bandwidth(MBps)         4815.18                 8061.77(+67.42%)
> Latency(us)               6.176                   3.449(-44.15%)
> 
> 3. nginx/wrk
> 
>   - serv: <smc_run> nginx
>   - clnt: <smc_run> wrk -t 8 -c 1000 -d 30 http://127.0.0.1:80
> 
>                             TCP                   SMC-lo
> Requests/s           196555.02                263270.95(+33.94%)
> 
> 4. redis-benchmark
> 
>   - serv: <smc_run> redis-server
>   - clnt: <smc_run> redis-benchmark -h 127.0.0.1 -q -t set,get -n 400000 -c 200 -d 1024
> 
>                             TCP                   SMC-lo
> GET(Requests/s)       88711.47                120048.02(+35.32%)
> SET(Requests/s)       89465.44                123152.71(+37.65%)
> 
> 
> Change log:
> 
> v6->RFC v5
> - Patch #2: make the use of CONFIG_SMC_LO cleaner.
> - Patch #5: mark some smcd_ops that loopback-ism doesn't support as
>    optional and check for the support when they are called.
> - Patch #7: keep loopback-ism at the beginning of the SMC-D device list.
> - Some expression changes in commit logs and comments.
> 
> RFC v5->RFC v4:
> Link: https://lore.kernel.org/netdev/20240324135522.108564-1-guwen@linux.alibaba.com/
> - Patch #2: minor changes in description of config SMC_LO and comments.
> - Patch #10: minor changes in comments and if(smc_ism_support_dmb_nocopy())
>    check in smcd_cdc_msg_send().
> - Patch #3: change smc_lo_generate_id() to smc_lo_generate_ids() and SMC_LO_CHID
>    to SMC_LO_RESERVED_CHID.
> - Patch #5: memcpy while holding the ldev->dmb_ht_lock.
> - Some expression changes in commit logs.
> 
> RFC v4->v3:
> Link: https://lore.kernel.org/netdev/20240317100545.96663-1-guwen@linux.alibaba.com/
> - The merge window of v6.9 is open, so post this series as an RFC.
> - Patch #6: since some information fed back by smc_nl_handle_smcd_dev() dose
>    not apply to Emulated-ISM (including loopback-ism here), loopback-ism is
>    not exposed through smc netlink for the time being. we may refactor this
>    part when smc netlink interface is updated.
> 
> v3->v2:
> Link: https://lore.kernel.org/netdev/20240312142743.41406-1-guwen@linux.alibaba.com/
> - Patch #11: use tasklet_schedule(&conn->rx_tsklet) instead of smcd_cdc_rx_handler()
>    to avoid possible recursive locking of conn->send_lock and use {read|write}_lock_bh()
>    to acquire dmb_ht_lock.
> 
> v2->v1:
> Link: https://lore.kernel.org/netdev/20240307095536.29648-1-guwen@linux.alibaba.com/
> - All the patches: changed the term virtual-ISM to Emulated-ISM as defined by SMCv2.1.
> - Patch #3: optimized the description of SMC_LO config. Avoid exposing loopback-ism
>    to sysfs and remove all the knobs until future definition clear.
> - Patch #3: try to make lockdep happy by using read_lock_bh() in smc_lo_move_data().
> - Patch #6: defaultly use physical contiguous DMB buffers.
> - Patch #11: defaultly enable DMB no-copy for loopback-ism and free the DMB in
>    unregister_dmb or detach_dmb when dmb_node->refcnt reaches 0, instead of using
>    wait_event to keep waiting in unregister_dmb.
> 
> v1->RFC:
> Link: https://lore.kernel.org/netdev/20240111120036.109903-1-guwen@linux.alibaba.com/
> - Patch #9: merge rx_bytes and tx_bytes as xfer_bytes statistics:
>    /sys/devices/virtual/smc/loopback-ism/xfer_bytes
> - Patch #10: add support_dmb_nocopy operation to check if SMC-D device supports
>    merging sndbuf with peer DMB.
> - Patch #13 & #14: introduce loopback-ism device control of DMB memory type and
>    control of whether to merge sndbuf and DMB. They can be respectively set by:
>    /sys/devices/virtual/smc/loopback-ism/dmb_type
>    /sys/devices/virtual/smc/loopback-ism/dmb_copy
>    The motivation for these two control is that a performance bottleneck was
>    found when using vzalloced DMB and sndbuf is merged with DMB, and there are
>    many CPUs and CONFIG_HARDENED_USERCOPY is set [4]. The bottleneck is caused
>    by the lock contention in vmap_area_lock [5] which is involved in memcpy_from_msg()
>    or memcpy_to_msg(). Currently, Uladzislau Rezki is working on mitigating the
>    vmap lock contention [6]. It has significant effects, but using virtual memory
>    still has additional overhead compared to using physical memory.
>    So this new version provides controls of dmb_type and dmb_copy to suit
>    different scenarios.
> - Some minor changes and comments improvements.
> 
> RFC->old version([1]):
> Link: https://lore.kernel.org/netdev/1702214654-32069-1-git-send-email-guwen@linux.alibaba.com/
> - Patch #1: improve the loopback-ism dump, it shows as follows now:
>    # smcd d
>    FID  Type  PCI-ID        PCHID  InUse  #LGs  PNET-ID
>    0000 0     loopback-ism  ffff   No        0
> - Patch #3: introduce the smc_ism_set_v2_capable() helper and set
>    smc_ism_v2_capable when ISMv2 or virtual ISM is registered,
>    regardless of whether there is already a device in smcd device list.
> - Patch #3: loopback-ism will be added into /sys/devices/virtual/smc/loopback-ism/.
> - Patch #8: introduce the runtime switch /sys/devices/virtual/smc/loopback-ism/active
>    to activate or deactivate the loopback-ism.
> - Patch #9: introduce the statistics of loopback-ism by
>    /sys/devices/virtual/smc/loopback-ism/{{tx|rx}_tytes|dmbs_cnt}.
> - Some minor changes and comments improvements.
> 
> [1] https://lore.kernel.org/netdev/1695568613-125057-1-git-send-email-guwen@linux.alibaba.com/
> [2] https://lore.kernel.org/netdev/20231219142616.80697-1-guwen@linux.alibaba.com/
> [3] https://github.com/goldsborough/ipc-bench
> [4] https://lore.kernel.org/all/3189e342-c38f-6076-b730-19a6efd732a5@linux.alibaba.com/
> [5] https://lore.kernel.org/all/238e63cd-e0e8-4fbf-852f-bc4d5bc35d5a@linux.alibaba.com/
> [6] https://lore.kernel.org/all/20240102184633.748113-1-urezki@gmail.com/
> 
> Wen Gu (11):
>    net/smc: decouple ism_client from SMC-D DMB registration
>    net/smc: introduce loopback-ism for SMC intra-OS shortcut
>    net/smc: implement ID-related operations of loopback-ism
>    net/smc: implement DMB-related operations of loopback-ism
>    net/smc: mark optional smcd_ops and check for support when called
>    net/smc: ignore loopback-ism when dumping SMC-D devices
>    net/smc: register loopback-ism into SMC-D device list
>    net/smc: add operations to merge sndbuf with peer DMB
>    net/smc: {at|de}tach sndbuf to peer DMB if supported
>    net/smc: adapt cursor update when sndbuf and peer DMB are merged
>    net/smc: implement DMB-merged operations of loopback-ism
> 
>   drivers/s390/net/ism_drv.c |   2 +-
>   include/net/smc.h          |  21 +-
>   net/smc/Kconfig            |  13 ++
>   net/smc/Makefile           |   1 +
>   net/smc/af_smc.c           |  28 ++-
>   net/smc/smc_cdc.c          |  34 ++-
>   net/smc/smc_core.c         |  61 +++++-
>   net/smc/smc_core.h         |   1 +
>   net/smc/smc_ism.c          |  88 ++++++--
>   net/smc/smc_ism.h          |  10 +
>   net/smc/smc_loopback.c     | 427 +++++++++++++++++++++++++++++++++++++
>   net/smc/smc_loopback.h     |  62 ++++++
>   12 files changed, 721 insertions(+), 27 deletions(-)
>   create mode 100644 net/smc/smc_loopback.c
>   create mode 100644 net/smc/smc_loopback.h
> 
Still need some time to review the patches.

Thanks for your patience!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism
  2024-04-17 15:36 ` [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism Wenjia Zhang
@ 2024-04-23  6:45   ` Wen Gu
  0 siblings, 0 replies; 27+ messages in thread
From: Wen Gu @ 2024-04-23  6:45 UTC (permalink / raw
  To: Wenjia Zhang, wintera, twinkler, hca, gor, agordeev, davem,
	edumazet, kuba, pabeni, jaka
  Cc: borntraeger, svens, alibuda, tonylu, linux-kernel, linux-s390,
	netdev



On 2024/4/17 23:36, Wenjia Zhang wrote:
> 
> 
> On 14.04.24 06:02, Wen Gu wrote:
>> This patch set acts as the second part of the new version of [1] (The first
>> part can be referred from [2]), the updated things of this version are listed
>> at the end.
>>
>>
> Still need some time to review the patches.
> 
> Thanks for your patience!

Hi Wenjia, is there any update on the review?

Thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v6 07/11] net/smc: register loopback-ism into SMC-D device list
  2024-04-14  4:03 ` [PATCH net-next v6 07/11] net/smc: register loopback-ism into SMC-D device list Wen Gu
@ 2024-04-25 11:29   ` Wenjia Zhang
  2024-04-25 13:29     ` Alexandra Winter
  0 siblings, 1 reply; 27+ messages in thread
From: Wenjia Zhang @ 2024-04-25 11:29 UTC (permalink / raw
  To: Wen Gu, wintera, twinkler, hca, gor, agordeev, davem, edumazet,
	kuba, pabeni, jaka
  Cc: borntraeger, svens, alibuda, tonylu, linux-kernel, linux-s390,
	netdev



On 14.04.24 06:03, Wen Gu wrote:
> After the loopback-ism device is ready, add it to the SMC-D device list
> as an ISMv2 device, and always keep it at the beginning to ensure it is
> preferred for providing a shortcut for data transfer within the same
> kernel.
> 
> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
> ---
>   net/smc/smc_ism.c      | 30 +++++++++++++++++++++---------
>   net/smc/smc_ism.h      |  1 +
>   net/smc/smc_loopback.c | 20 +++++++++++++-------
>   3 files changed, 35 insertions(+), 16 deletions(-)
> 
> diff --git a/net/smc/smc_ism.c b/net/smc/smc_ism.c
> index 188fd28423c2..6bed0a61b746 100644
> --- a/net/smc/smc_ism.c
> +++ b/net/smc/smc_ism.c
> @@ -91,6 +91,11 @@ bool smc_ism_is_v2_capable(void)
>   	return smc_ism_v2_capable;
>   }
>   
> +void smc_ism_set_v2_capable(void)
> +{
> +	smc_ism_v2_capable = true;
> +}
> +
>   /* Set a connection using this DMBE. */
>   void smc_ism_set_conn(struct smc_connection *conn)
>   {
> @@ -439,7 +444,7 @@ static struct smcd_dev *smcd_alloc_dev(struct device *parent, const char *name,
>   static void smcd_register_dev(struct ism_dev *ism)
>   {
>   	const struct smcd_ops *ops = ism_get_smcd_ops();
> -	struct smcd_dev *smcd;
> +	struct smcd_dev *smcd, *fentry;
>   
>   	if (!ops)
>   		return;
> @@ -454,16 +459,23 @@ static void smcd_register_dev(struct ism_dev *ism)
>   	if (smc_pnetid_by_dev_port(&ism->pdev->dev, 0, smcd->pnetid))
>   		smc_pnetid_by_table_smcd(smcd);
>   
> +	if (smcd->ops->supports_v2())
> +		smc_ism_set_v2_capable();
>   	mutex_lock(&smcd_dev_list.mutex);
> -	if (list_empty(&smcd_dev_list.list)) {
> -		if (smcd->ops->supports_v2())
> -			smc_ism_v2_capable = true;
> -	}
> -	/* sort list: devices without pnetid before devices with pnetid */
> -	if (smcd->pnetid[0])
> +	/* sort list:
> +	 * - devices without pnetid before devices with pnetid;
> +	 * - loopback-ism always at the very beginning;
> +	 */
> +	if (!smcd->pnetid[0]) {
> +		fentry = list_first_entry_or_null(&smcd_dev_list.list,
> +						  struct smcd_dev, list);
> +		if (fentry && smc_ism_is_loopback(fentry))
> +			list_add(&smcd->list, &fentry->list);
> +		else
> +			list_add(&smcd->list, &smcd_dev_list.list);
> +	} else {
>   		list_add_tail(&smcd->list, &smcd_dev_list.list);
> -	else
> -		list_add(&smcd->list, &smcd_dev_list.list);
> +	}

Nit: here the pair of curly brackets are unnecessary.

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v6 01/11] net/smc: decouple ism_client from SMC-D DMB registration
  2024-04-15  8:41   ` Alexandra Winter
  2024-04-16 13:58     ` Wen Gu
@ 2024-04-25 11:29     ` Wenjia Zhang
  1 sibling, 0 replies; 27+ messages in thread
From: Wenjia Zhang @ 2024-04-25 11:29 UTC (permalink / raw
  To: Alexandra Winter, Wen Gu, twinkler, hca, gor, agordeev, davem,
	edumazet, kuba, pabeni, jaka
  Cc: borntraeger, svens, alibuda, tonylu, linux-kernel, linux-s390,
	netdev



On 15.04.24 10:41, Alexandra Winter wrote:
> 
> 
> On 14.04.24 06:02, Wen Gu wrote:
>> The struct 'ism_client' is specialized for s390 platform firmware ISM.
>> So replace it with 'void' to make SMCD DMB registration helper generic
>> for both Emulated-ISM and existing ISM.
>>
>> Signed-off-by: Wen Gu <guwen@linux.alibaba.com>
>> ---
> 
> Just a thought:
> The client concept is really specific to s390 platform firmware ISM.
> So wouldn't it be nice to do something like:
> 
> diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
> index 78cca4839a31..37dcdf2bc044 100644
> --- a/drivers/s390/net/ism_drv.c
> +++ b/drivers/s390/net/ism_drv.c
> @@ -747,10 +747,9 @@ static int smcd_query_rgid(struct smcd_dev *smcd, struct smcd_gid *rgid,
>          return ism_query_rgid(smcd->priv, rgid->gid, vid_valid, vid);
>   }
> 
> -static int smcd_register_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb,
> -                            struct ism_client *client)
> +static int smcd_register_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb)
>   {
> -       return ism_register_dmb(smcd->priv, (struct ism_dmb *)dmb, client);
> +       return ism_register_dmb(smcd->priv, (struct ism_dmb *)dmb, &smc_ism_client);
>   }
> 
>   static int smcd_unregister_dmb(struct smcd_dev *smcd, struct smcd_dmb *dmb)
> 
> --------------
> 
> This is not a real patch, just a sketch, but I hope you
> get the idea.
> 
> 
> This may be a step in the direction of moving the ism_client concept from
> net/smc/smc_ism.c to drivers/s390/net/ism*
> 
> 
> I know that there are several dependencies to consider.
> And I haven't looked at the other patches in this series yet in detail, to see how you solve
> things like smcd_register_dev. Seems like smcd_register_dmb() is the only one of the smcd_ops
> that you need for loopback and uses ism_client.
> 
> 
> 
> Wenjia, Gerd, and others what do you think?


IMO, the idea is not bad, but it should not belong to this patch series.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism
  2024-04-14  4:02 [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism Wen Gu
                   ` (11 preceding siblings ...)
  2024-04-17 15:36 ` [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism Wenjia Zhang
@ 2024-04-25 11:30 ` Wenjia Zhang
  2024-04-26  7:02 ` Jan Karcher
  13 siblings, 0 replies; 27+ messages in thread
From: Wenjia Zhang @ 2024-04-25 11:30 UTC (permalink / raw
  To: Wen Gu, wintera, twinkler, hca, gor, agordeev, davem, edumazet,
	kuba, pabeni, jaka
  Cc: borntraeger, svens, alibuda, tonylu, linux-kernel, linux-s390,
	netdev



On 14.04.24 06:02, Wen Gu wrote:
> This patch set acts as the second part of the new version of [1] (The first
> part can be referred from [2]), the updated things of this version are listed
> at the end.
> 
> - Background
> 
> SMC-D is now used in IBM z with ISM function to optimize network interconnect
> for intra-CPC communications. Inspired by this, we try to make SMC-D available
> on the non-s390 architecture through a software-implemented Emulated-ISM device,
> that is the loopback-ism device here, to accelerate inter-process or
> inter-containers communication within the same OS instance.
> 
> - Design
> 
> This patch set includes 3 parts:
> 
>   - Patch #1: some prepare work for loopback-ism.
>   - Patch #2-#7: implement loopback-ism device and adapt SMC-D for it.
>     loopback-ism now serves only SMC and no userspace interfaces exposed.
>   - Patch #8-#11: memory copy optimization for intra-OS scenario.
> 
> The loopback-ism device is designed as an ISMv2 device and not be limited to
> a specific net namespace, ends of both inter-process connection (1/1' in diagram
> below) or inter-container connection (2/2' in diagram below) can find the same
> available loopback-ism and choose it during the CLC handshake.
> 
>   Container 1 (ns1)                              Container 2 (ns2)
>   +-----------------------------------------+    +-------------------------+
>   | +-------+      +-------+      +-------+ |    |        +-------+        |
>   | | App A |      | App B |      | App C | |    |        | App D |<-+     |
>   | +-------+      +---^---+      +-------+ |    |        +-------+  |(2') |
>   |     |127.0.0.1 (1')|             |192.168.0.11       192.168.0.12|     |
>   |  (1)|   +--------+ | +--------+  |(2)   |    | +--------+   +--------+ |
>   |     `-->|   lo   |-` |  eth0  |<-`      |    | |   lo   |   |  eth0  | |
>   +---------+--|---^-+---+-----|--+---------+    +-+--------+---+-^------+-+
>                |   |           |                                  |
>   Kernel       |   |           |                                  |
>   +----+-------v---+-----------v----------------------------------+---+----+
>   |    |                            TCP                               |    |
>   |    |                                                              |    |
>   |    +--------------------------------------------------------------+    |
>   |                                                                        |
>   |                           +--------------+                             |
>   |                           | smc loopback |                             |
>   +---------------------------+--------------+-----------------------------+
> 
> loopback-ism device creates DMBs (shared memory) for each connection peer.
> Since data transfer occurs within the same kernel, the sndbuf of each peer
> is only a descriptor and point to the same memory region as peer DMB, so that
> the data copy from sndbuf to peer DMB can be avoided in loopback-ism case.
> 
>   Container 1 (ns1)                              Container 2 (ns2)
>   +-----------------------------------------+    +-------------------------+
>   | +-------+                               |    |        +-------+        |
>   | | App C |-----+                         |    |        | App D |        |
>   | +-------+     |                         |    |        +-^-----+        |
>   |               |                         |    |          |              |
>   |           (2) |                         |    |     (2') |              |
>   |               |                         |    |          |              |
>   +---------------|-------------------------+    +----------|--------------+
>                   |                                         |
>   Kernel          |                                         |
>   +---------------|-----------------------------------------|--------------+
>   | +--------+ +--v-----+                           +--------+ +--------+  |
>   | |dmb_desc| |snd_desc|                           |dmb_desc| |snd_desc|  |
>   | +-----|--+ +--|-----+                           +-----|--+ +--------+  |
>   | +-----|--+    |                                 +-----|--+             |
>   | | DMB C  |    +---------------------------------| DMB D  |             |
>   | +--------+                                      +--------+             |
>   |                                                                        |
>   |                           +--------------+                             |
>   |                           | smc loopback |                             |
>   +---------------------------+--------------+-----------------------------+
> 
> - Benchmark Test
> 
>   * Test environments:
>        - VM with Intel Xeon Platinum 8 core 2.50GHz, 16 GiB mem.
>        - SMC sndbuf/DMB size 1MB.
> 
>   * Test object:
>        - TCP: run on TCP loopback.
>        - SMC lo: run on SMC loopback-ism.
> 
> 1. ipc-benchmark (see [3])
> 
>   - ./<foo> -c 1000000 -s 100
> 
>                              TCP                  SMC-lo
> Message
> rate (msg/s)              79693                  148236(+86.01%)
> 
> 2. sockperf
> 
>   - serv: <smc_run> sockperf sr --tcp
>   - clnt: <smc_run> sockperf { tp | pp } --tcp --msg-size={ 64000 for tp | 14 for pp } -i 127.0.0.1 -t 30
> 
>                              TCP                  SMC-lo
> Bandwidth(MBps)         4815.18                 8061.77(+67.42%)
> Latency(us)               6.176                   3.449(-44.15%)
> 
> 3. nginx/wrk
> 
>   - serv: <smc_run> nginx
>   - clnt: <smc_run> wrk -t 8 -c 1000 -d 30 http://127.0.0.1:80
> 
>                             TCP                   SMC-lo
> Requests/s           196555.02                263270.95(+33.94%)
> 
> 4. redis-benchmark
> 
>   - serv: <smc_run> redis-server
>   - clnt: <smc_run> redis-benchmark -h 127.0.0.1 -q -t set,get -n 400000 -c 200 -d 1024
> 
>                             TCP                   SMC-lo
> GET(Requests/s)       88711.47                120048.02(+35.32%)
> SET(Requests/s)       89465.44                123152.71(+37.65%)
> 
> 
> Change log:
> 
> v6->RFC v5
> - Patch #2: make the use of CONFIG_SMC_LO cleaner.
> - Patch #5: mark some smcd_ops that loopback-ism doesn't support as
>    optional and check for the support when they are called.
> - Patch #7: keep loopback-ism at the beginning of the SMC-D device list.
> - Some expression changes in commit logs and comments.
> 
> RFC v5->RFC v4:
> Link: https://lore.kernel.org/netdev/20240324135522.108564-1-guwen@linux.alibaba.com/
> - Patch #2: minor changes in description of config SMC_LO and comments.
> - Patch #10: minor changes in comments and if(smc_ism_support_dmb_nocopy())
>    check in smcd_cdc_msg_send().
> - Patch #3: change smc_lo_generate_id() to smc_lo_generate_ids() and SMC_LO_CHID
>    to SMC_LO_RESERVED_CHID.
> - Patch #5: memcpy while holding the ldev->dmb_ht_lock.
> - Some expression changes in commit logs.
> 
> RFC v4->v3:
> Link: https://lore.kernel.org/netdev/20240317100545.96663-1-guwen@linux.alibaba.com/
> - The merge window of v6.9 is open, so post this series as an RFC.
> - Patch #6: since some information fed back by smc_nl_handle_smcd_dev() dose
>    not apply to Emulated-ISM (including loopback-ism here), loopback-ism is
>    not exposed through smc netlink for the time being. we may refactor this
>    part when smc netlink interface is updated.
> 
> v3->v2:
> Link: https://lore.kernel.org/netdev/20240312142743.41406-1-guwen@linux.alibaba.com/
> - Patch #11: use tasklet_schedule(&conn->rx_tsklet) instead of smcd_cdc_rx_handler()
>    to avoid possible recursive locking of conn->send_lock and use {read|write}_lock_bh()
>    to acquire dmb_ht_lock.
> 
> v2->v1:
> Link: https://lore.kernel.org/netdev/20240307095536.29648-1-guwen@linux.alibaba.com/
> - All the patches: changed the term virtual-ISM to Emulated-ISM as defined by SMCv2.1.
> - Patch #3: optimized the description of SMC_LO config. Avoid exposing loopback-ism
>    to sysfs and remove all the knobs until future definition clear.
> - Patch #3: try to make lockdep happy by using read_lock_bh() in smc_lo_move_data().
> - Patch #6: defaultly use physical contiguous DMB buffers.
> - Patch #11: defaultly enable DMB no-copy for loopback-ism and free the DMB in
>    unregister_dmb or detach_dmb when dmb_node->refcnt reaches 0, instead of using
>    wait_event to keep waiting in unregister_dmb.
> 
> v1->RFC:
> Link: https://lore.kernel.org/netdev/20240111120036.109903-1-guwen@linux.alibaba.com/
> - Patch #9: merge rx_bytes and tx_bytes as xfer_bytes statistics:
>    /sys/devices/virtual/smc/loopback-ism/xfer_bytes
> - Patch #10: add support_dmb_nocopy operation to check if SMC-D device supports
>    merging sndbuf with peer DMB.
> - Patch #13 & #14: introduce loopback-ism device control of DMB memory type and
>    control of whether to merge sndbuf and DMB. They can be respectively set by:
>    /sys/devices/virtual/smc/loopback-ism/dmb_type
>    /sys/devices/virtual/smc/loopback-ism/dmb_copy
>    The motivation for these two control is that a performance bottleneck was
>    found when using vzalloced DMB and sndbuf is merged with DMB, and there are
>    many CPUs and CONFIG_HARDENED_USERCOPY is set [4]. The bottleneck is caused
>    by the lock contention in vmap_area_lock [5] which is involved in memcpy_from_msg()
>    or memcpy_to_msg(). Currently, Uladzislau Rezki is working on mitigating the
>    vmap lock contention [6]. It has significant effects, but using virtual memory
>    still has additional overhead compared to using physical memory.
>    So this new version provides controls of dmb_type and dmb_copy to suit
>    different scenarios.
> - Some minor changes and comments improvements.
> 
> RFC->old version([1]):
> Link: https://lore.kernel.org/netdev/1702214654-32069-1-git-send-email-guwen@linux.alibaba.com/
> - Patch #1: improve the loopback-ism dump, it shows as follows now:
>    # smcd d
>    FID  Type  PCI-ID        PCHID  InUse  #LGs  PNET-ID
>    0000 0     loopback-ism  ffff   No        0
> - Patch #3: introduce the smc_ism_set_v2_capable() helper and set
>    smc_ism_v2_capable when ISMv2 or virtual ISM is registered,
>    regardless of whether there is already a device in smcd device list.
> - Patch #3: loopback-ism will be added into /sys/devices/virtual/smc/loopback-ism/.
> - Patch #8: introduce the runtime switch /sys/devices/virtual/smc/loopback-ism/active
>    to activate or deactivate the loopback-ism.
> - Patch #9: introduce the statistics of loopback-ism by
>    /sys/devices/virtual/smc/loopback-ism/{{tx|rx}_tytes|dmbs_cnt}.
> - Some minor changes and comments improvements.
> 
> [1] https://lore.kernel.org/netdev/1695568613-125057-1-git-send-email-guwen@linux.alibaba.com/
> [2] https://lore.kernel.org/netdev/20231219142616.80697-1-guwen@linux.alibaba.com/
> [3] https://github.com/goldsborough/ipc-bench
> [4] https://lore.kernel.org/all/3189e342-c38f-6076-b730-19a6efd732a5@linux.alibaba.com/
> [5] https://lore.kernel.org/all/238e63cd-e0e8-4fbf-852f-bc4d5bc35d5a@linux.alibaba.com/
> [6] https://lore.kernel.org/all/20240102184633.748113-1-urezki@gmail.com/
> 
> Wen Gu (11):
>    net/smc: decouple ism_client from SMC-D DMB registration
>    net/smc: introduce loopback-ism for SMC intra-OS shortcut
>    net/smc: implement ID-related operations of loopback-ism
>    net/smc: implement DMB-related operations of loopback-ism
>    net/smc: mark optional smcd_ops and check for support when called
>    net/smc: ignore loopback-ism when dumping SMC-D devices
>    net/smc: register loopback-ism into SMC-D device list
>    net/smc: add operations to merge sndbuf with peer DMB
>    net/smc: {at|de}tach sndbuf to peer DMB if supported
>    net/smc: adapt cursor update when sndbuf and peer DMB are merged
>    net/smc: implement DMB-merged operations of loopback-ism
> 
>   drivers/s390/net/ism_drv.c |   2 +-
>   include/net/smc.h          |  21 +-
>   net/smc/Kconfig            |  13 ++
>   net/smc/Makefile           |   1 +
>   net/smc/af_smc.c           |  28 ++-
>   net/smc/smc_cdc.c          |  34 ++-
>   net/smc/smc_core.c         |  61 +++++-
>   net/smc/smc_core.h         |   1 +
>   net/smc/smc_ism.c          |  88 ++++++--
>   net/smc/smc_ism.h          |  10 +
>   net/smc/smc_loopback.c     | 427 +++++++++++++++++++++++++++++++++++++
>   net/smc/smc_loopback.h     |  62 ++++++
>   12 files changed, 721 insertions(+), 27 deletions(-)
>   create mode 100644 net/smc/smc_loopback.c
>   create mode 100644 net/smc/smc_loopback.h
> 

Hi Wen,

They look good to me. Thank you again for the effort and the patience!

Here is my reviewed-by for the whole patches series:
Reviewed-by: Wenjia Zhang <wenjia@linux.ibm.com>

Thanks,
Wenjia

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v6 07/11] net/smc: register loopback-ism into SMC-D device list
  2024-04-25 11:29   ` Wenjia Zhang
@ 2024-04-25 13:29     ` Alexandra Winter
  2024-04-26  8:04       ` Wenjia Zhang
  0 siblings, 1 reply; 27+ messages in thread
From: Alexandra Winter @ 2024-04-25 13:29 UTC (permalink / raw
  To: Wenjia Zhang, Wen Gu, twinkler, hca, gor, agordeev, davem,
	edumazet, kuba, pabeni, jaka
  Cc: borntraeger, svens, alibuda, tonylu, linux-kernel, linux-s390,
	netdev



On 25.04.24 13:29, Wenjia Zhang wrote:
>> +    if (!smcd->pnetid[0]) {
>> +        fentry = list_first_entry_or_null(&smcd_dev_list.list,
>> +                          struct smcd_dev, list);
>> +        if (fentry && smc_ism_is_loopback(fentry))
>> +            list_add(&smcd->list, &fentry->list);
>> +        else
>> +            list_add(&smcd->list, &smcd_dev_list.list);
>> +    } else {
>>           list_add_tail(&smcd->list, &smcd_dev_list.list);
>> -    else
>> -        list_add(&smcd->list, &smcd_dev_list.list);
>> +    }
> 
> Nit: here the pair of curly brackets are unnecessary.

Actually
https://www.kernel.org/doc/html/latest/process/coding-style.html#codingstyle
tells you to use those braces.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism
  2024-04-14  4:02 [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism Wen Gu
                   ` (12 preceding siblings ...)
  2024-04-25 11:30 ` Wenjia Zhang
@ 2024-04-26  7:02 ` Jan Karcher
  2024-04-26  8:17   ` Wen Gu
  13 siblings, 1 reply; 27+ messages in thread
From: Jan Karcher @ 2024-04-26  7:02 UTC (permalink / raw
  To: Wen Gu, wintera, twinkler, hca, gor, agordeev, davem, edumazet,
	kuba, pabeni, wenjia
  Cc: borntraeger, svens, alibuda, tonylu, linux-kernel, linux-s390,
	netdev



On 14/04/2024 06:02, Wen Gu wrote:
> This patch set acts as the second part of the new version of [1] (The first
> part can be referred from [2]), the updated things of this version are listed
> at the end.
> 
> - Background
> 
> SMC-D is now used in IBM z with ISM function to optimize network interconnect
> for intra-CPC communications. Inspired by this, we try to make SMC-D available
> on the non-s390 architecture through a software-implemented Emulated-ISM device,
> that is the loopback-ism device here, to accelerate inter-process or
> inter-containers communication within the same OS instance.
> 
> - Design
> 
> This patch set includes 3 parts:
> 
>   - Patch #1: some prepare work for loopback-ism.
>   - Patch #2-#7: implement loopback-ism device and adapt SMC-D for it.
>     loopback-ism now serves only SMC and no userspace interfaces exposed.
>   - Patch #8-#11: memory copy optimization for intra-OS scenario.
> 
> The loopback-ism device is designed as an ISMv2 device and not be limited to
> a specific net namespace, ends of both inter-process connection (1/1' in diagram
> below) or inter-container connection (2/2' in diagram below) can find the same
> available loopback-ism and choose it during the CLC handshake.
> 
>   Container 1 (ns1)                              Container 2 (ns2)
>   +-----------------------------------------+    +-------------------------+
>   | +-------+      +-------+      +-------+ |    |        +-------+        |
>   | | App A |      | App B |      | App C | |    |        | App D |<-+     |
>   | +-------+      +---^---+      +-------+ |    |        +-------+  |(2') |
>   |     |127.0.0.1 (1')|             |192.168.0.11       192.168.0.12|     |
>   |  (1)|   +--------+ | +--------+  |(2)   |    | +--------+   +--------+ |
>   |     `-->|   lo   |-` |  eth0  |<-`      |    | |   lo   |   |  eth0  | |
>   +---------+--|---^-+---+-----|--+---------+    +-+--------+---+-^------+-+
>                |   |           |                                  |
>   Kernel       |   |           |                                  |
>   +----+-------v---+-----------v----------------------------------+---+----+
>   |    |                            TCP                               |    |
>   |    |                                                              |    |
>   |    +--------------------------------------------------------------+    |
>   |                                                                        |
>   |                           +--------------+                             |
>   |                           | smc loopback |                             |
>   +---------------------------+--------------+-----------------------------+
> 
> loopback-ism device creates DMBs (shared memory) for each connection peer.
> Since data transfer occurs within the same kernel, the sndbuf of each peer
> is only a descriptor and point to the same memory region as peer DMB, so that
> the data copy from sndbuf to peer DMB can be avoided in loopback-ism case.
> 
>   Container 1 (ns1)                              Container 2 (ns2)
>   +-----------------------------------------+    +-------------------------+
>   | +-------+                               |    |        +-------+        |
>   | | App C |-----+                         |    |        | App D |        |
>   | +-------+     |                         |    |        +-^-----+        |
>   |               |                         |    |          |              |
>   |           (2) |                         |    |     (2') |              |
>   |               |                         |    |          |              |
>   +---------------|-------------------------+    +----------|--------------+
>                   |                                         |
>   Kernel          |                                         |
>   +---------------|-----------------------------------------|--------------+
>   | +--------+ +--v-----+                           +--------+ +--------+  |
>   | |dmb_desc| |snd_desc|                           |dmb_desc| |snd_desc|  |
>   | +-----|--+ +--|-----+                           +-----|--+ +--------+  |
>   | +-----|--+    |                                 +-----|--+             |
>   | | DMB C  |    +---------------------------------| DMB D  |             |
>   | +--------+                                      +--------+             |
>   |                                                                        |
>   |                           +--------------+                             |
>   |                           | smc loopback |                             |
>   +---------------------------+--------------+-----------------------------+
> 
> - Benchmark Test
> 
>   * Test environments:
>        - VM with Intel Xeon Platinum 8 core 2.50GHz, 16 GiB mem.
>        - SMC sndbuf/DMB size 1MB.
> 
>   * Test object:
>        - TCP: run on TCP loopback.
>        - SMC lo: run on SMC loopback-ism.
> 
> 1. ipc-benchmark (see [3])
> 
>   - ./<foo> -c 1000000 -s 100
> 
>                              TCP                  SMC-lo
> Message
> rate (msg/s)              79693                  148236(+86.01%)
> 
> 2. sockperf
> 
>   - serv: <smc_run> sockperf sr --tcp
>   - clnt: <smc_run> sockperf { tp | pp } --tcp --msg-size={ 64000 for tp | 14 for pp } -i 127.0.0.1 -t 30
> 
>                              TCP                  SMC-lo
> Bandwidth(MBps)         4815.18                 8061.77(+67.42%)
> Latency(us)               6.176                   3.449(-44.15%)
> 
> 3. nginx/wrk
> 
>   - serv: <smc_run> nginx
>   - clnt: <smc_run> wrk -t 8 -c 1000 -d 30 http://127.0.0.1:80
> 
>                             TCP                   SMC-lo
> Requests/s           196555.02                263270.95(+33.94%)
> 
> 4. redis-benchmark
> 
>   - serv: <smc_run> redis-server
>   - clnt: <smc_run> redis-benchmark -h 127.0.0.1 -q -t set,get -n 400000 -c 200 -d 1024
> 
>                             TCP                   SMC-lo
> GET(Requests/s)       88711.47                120048.02(+35.32%)
> SET(Requests/s)       89465.44                123152.71(+37.65%)
> 
> 

Hi Wen Gu,

I did run the tests again with the v6 and reviewed the patchset. If you 
decide to address Simons nit feel free to add my:

Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com>

Thanks for your effort and contribution.
- J

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v6 07/11] net/smc: register loopback-ism into SMC-D device list
  2024-04-25 13:29     ` Alexandra Winter
@ 2024-04-26  8:04       ` Wenjia Zhang
  2024-04-26  8:09         ` Wen Gu
  0 siblings, 1 reply; 27+ messages in thread
From: Wenjia Zhang @ 2024-04-26  8:04 UTC (permalink / raw
  To: Alexandra Winter, Wen Gu, twinkler, hca, gor, agordeev, davem,
	edumazet, kuba, pabeni, jaka
  Cc: borntraeger, svens, alibuda, tonylu, linux-kernel, linux-s390,
	netdev



On 25.04.24 15:29, Alexandra Winter wrote:
> 
> 
> On 25.04.24 13:29, Wenjia Zhang wrote:
>>> +    if (!smcd->pnetid[0]) {
>>> +        fentry = list_first_entry_or_null(&smcd_dev_list.list,
>>> +                          struct smcd_dev, list);
>>> +        if (fentry && smc_ism_is_loopback(fentry))
>>> +            list_add(&smcd->list, &fentry->list);
>>> +        else
>>> +            list_add(&smcd->list, &smcd_dev_list.list);
>>> +    } else {
>>>            list_add_tail(&smcd->list, &smcd_dev_list.list);
>>> -    else
>>> -        list_add(&smcd->list, &smcd_dev_list.list);
>>> +    }
>>
>> Nit: here the pair of curly brackets are unnecessary.
> 
> Actually
> https://www.kernel.org/doc/html/latest/process/coding-style.html#codingstyle
> tells you to use those braces.
Thanks, @Alexandra!
Then @Wen, forget my comments on this pls!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v6 07/11] net/smc: register loopback-ism into SMC-D device list
  2024-04-26  8:04       ` Wenjia Zhang
@ 2024-04-26  8:09         ` Wen Gu
  0 siblings, 0 replies; 27+ messages in thread
From: Wen Gu @ 2024-04-26  8:09 UTC (permalink / raw
  To: Wenjia Zhang, Alexandra Winter, twinkler, hca, gor, agordeev,
	davem, edumazet, kuba, pabeni, jaka
  Cc: borntraeger, svens, alibuda, tonylu, linux-kernel, linux-s390,
	netdev



On 2024/4/26 16:04, Wenjia Zhang wrote:
> 
> 
> On 25.04.24 15:29, Alexandra Winter wrote:
>>
>>
>> On 25.04.24 13:29, Wenjia Zhang wrote:
>>>> +    if (!smcd->pnetid[0]) {
>>>> +        fentry = list_first_entry_or_null(&smcd_dev_list.list,
>>>> +                          struct smcd_dev, list);
>>>> +        if (fentry && smc_ism_is_loopback(fentry))
>>>> +            list_add(&smcd->list, &fentry->list);
>>>> +        else
>>>> +            list_add(&smcd->list, &smcd_dev_list.list);
>>>> +    } else {
>>>>            list_add_tail(&smcd->list, &smcd_dev_list.list);
>>>> -    else
>>>> -        list_add(&smcd->list, &smcd_dev_list.list);
>>>> +    }
>>>
>>> Nit: here the pair of curly brackets are unnecessary.
>>
>> Actually
>> https://www.kernel.org/doc/html/latest/process/coding-style.html#codingstyle
>> tells you to use those braces.
> Thanks, @Alexandra!
> Then @Wen, forget my comments on this pls!

OK, Wenjia. Thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism
  2024-04-26  7:02 ` Jan Karcher
@ 2024-04-26  8:17   ` Wen Gu
  0 siblings, 0 replies; 27+ messages in thread
From: Wen Gu @ 2024-04-26  8:17 UTC (permalink / raw
  To: Jan Karcher, wintera, twinkler, hca, gor, agordeev, davem,
	edumazet, kuba, pabeni, wenjia
  Cc: borntraeger, svens, alibuda, tonylu, linux-kernel, linux-s390,
	netdev



On 2024/4/26 15:02, Jan Karcher wrote:
> 
> 
> On 14/04/2024 06:02, Wen Gu wrote:
>> This patch set acts as the second part of the new version of [1] (The first
>> part can be referred from [2]), the updated things of this version are listed
>> at the end.
>>
>> - Background
>>
>> SMC-D is now used in IBM z with ISM function to optimize network interconnect
>> for intra-CPC communications. Inspired by this, we try to make SMC-D available
>> on the non-s390 architecture through a software-implemented Emulated-ISM device,
>> that is the loopback-ism device here, to accelerate inter-process or
>> inter-containers communication within the same OS instance.
>>
>> - Design
>>
>> This patch set includes 3 parts:
>>
>>   - Patch #1: some prepare work for loopback-ism.
>>   - Patch #2-#7: implement loopback-ism device and adapt SMC-D for it.
>>     loopback-ism now serves only SMC and no userspace interfaces exposed.
>>   - Patch #8-#11: memory copy optimization for intra-OS scenario.
>>
>> The loopback-ism device is designed as an ISMv2 device and not be limited to
>> a specific net namespace, ends of both inter-process connection (1/1' in diagram
>> below) or inter-container connection (2/2' in diagram below) can find the same
>> available loopback-ism and choose it during the CLC handshake.
>>
>>   Container 1 (ns1)                              Container 2 (ns2)
>>   +-----------------------------------------+    +-------------------------+
>>   | +-------+      +-------+      +-------+ |    |        +-------+        |
>>   | | App A |      | App B |      | App C | |    |        | App D |<-+     |
>>   | +-------+      +---^---+      +-------+ |    |        +-------+  |(2') |
>>   |     |127.0.0.1 (1')|             |192.168.0.11       192.168.0.12|     |
>>   |  (1)|   +--------+ | +--------+  |(2)   |    | +--------+   +--------+ |
>>   |     `-->|   lo   |-` |  eth0  |<-`      |    | |   lo   |   |  eth0  | |
>>   +---------+--|---^-+---+-----|--+---------+    +-+--------+---+-^------+-+
>>                |   |           |                                  |
>>   Kernel       |   |           |                                  |
>>   +----+-------v---+-----------v----------------------------------+---+----+
>>   |    |                            TCP                               |    |
>>   |    |                                                              |    |
>>   |    +--------------------------------------------------------------+    |
>>   |                                                                        |
>>   |                           +--------------+                             |
>>   |                           | smc loopback |                             |
>>   +---------------------------+--------------+-----------------------------+
>>
>> loopback-ism device creates DMBs (shared memory) for each connection peer.
>> Since data transfer occurs within the same kernel, the sndbuf of each peer
>> is only a descriptor and point to the same memory region as peer DMB, so that
>> the data copy from sndbuf to peer DMB can be avoided in loopback-ism case.
>>
>>   Container 1 (ns1)                              Container 2 (ns2)
>>   +-----------------------------------------+    +-------------------------+
>>   | +-------+                               |    |        +-------+        |
>>   | | App C |-----+                         |    |        | App D |        |
>>   | +-------+     |                         |    |        +-^-----+        |
>>   |               |                         |    |          |              |
>>   |           (2) |                         |    |     (2') |              |
>>   |               |                         |    |          |              |
>>   +---------------|-------------------------+    +----------|--------------+
>>                   |                                         |
>>   Kernel          |                                         |
>>   +---------------|-----------------------------------------|--------------+
>>   | +--------+ +--v-----+                           +--------+ +--------+  |
>>   | |dmb_desc| |snd_desc|                           |dmb_desc| |snd_desc|  |
>>   | +-----|--+ +--|-----+                           +-----|--+ +--------+  |
>>   | +-----|--+    |                                 +-----|--+             |
>>   | | DMB C  |    +---------------------------------| DMB D  |             |
>>   | +--------+                                      +--------+             |
>>   |                                                                        |
>>   |                           +--------------+                             |
>>   |                           | smc loopback |                             |
>>   +---------------------------+--------------+-----------------------------+
>>
>> - Benchmark Test
>>
>>   * Test environments:
>>        - VM with Intel Xeon Platinum 8 core 2.50GHz, 16 GiB mem.
>>        - SMC sndbuf/DMB size 1MB.
>>
>>   * Test object:
>>        - TCP: run on TCP loopback.
>>        - SMC lo: run on SMC loopback-ism.
>>
>> 1. ipc-benchmark (see [3])
>>
>>   - ./<foo> -c 1000000 -s 100
>>
>>                              TCP                  SMC-lo
>> Message
>> rate (msg/s)              79693                  148236(+86.01%)
>>
>> 2. sockperf
>>
>>   - serv: <smc_run> sockperf sr --tcp
>>   - clnt: <smc_run> sockperf { tp | pp } --tcp --msg-size={ 64000 for tp | 14 for pp } -i 127.0.0.1 -t 30
>>
>>                              TCP                  SMC-lo
>> Bandwidth(MBps)         4815.18                 8061.77(+67.42%)
>> Latency(us)               6.176                   3.449(-44.15%)
>>
>> 3. nginx/wrk
>>
>>   - serv: <smc_run> nginx
>>   - clnt: <smc_run> wrk -t 8 -c 1000 -d 30 http://127.0.0.1:80
>>
>>                             TCP                   SMC-lo
>> Requests/s           196555.02                263270.95(+33.94%)
>>
>> 4. redis-benchmark
>>
>>   - serv: <smc_run> redis-server
>>   - clnt: <smc_run> redis-benchmark -h 127.0.0.1 -q -t set,get -n 400000 -c 200 -d 1024
>>
>>                             TCP                   SMC-lo
>> GET(Requests/s)       88711.47                120048.02(+35.32%)
>> SET(Requests/s)       89465.44                123152.71(+37.65%)
>>
>>
> 
> Hi Wen Gu,
> 
> I did run the tests again with the v6 and reviewed the patchset. If you decide to address Simons nit feel free to add my:
> 
> Reviewed-and-tested-by: Jan Karcher <jaka@linux.ibm.com>
> 
> Thanks for your effort and contribution.
> - J

Thank you all for the review and test! I will address Simon's nit and
collect the Rb tags in the next version.

Thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2024-04-26  8:17 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-14  4:02 [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism Wen Gu
2024-04-14  4:02 ` [PATCH net-next v6 01/11] net/smc: decouple ism_client from SMC-D DMB registration Wen Gu
2024-04-15  8:41   ` Alexandra Winter
2024-04-16 13:58     ` Wen Gu
2024-04-25 11:29     ` Wenjia Zhang
2024-04-14  4:02 ` [PATCH net-next v6 02/11] net/smc: introduce loopback-ism for SMC intra-OS shortcut Wen Gu
2024-04-17 10:20   ` Gerd Bayer
2024-04-14  4:02 ` [PATCH net-next v6 03/11] net/smc: implement ID-related operations of loopback-ism Wen Gu
2024-04-14  4:02 ` [PATCH net-next v6 04/11] net/smc: implement DMB-related " Wen Gu
2024-04-14  4:02 ` [PATCH net-next v6 05/11] net/smc: mark optional smcd_ops and check for support when called Wen Gu
2024-04-14  4:02 ` [PATCH net-next v6 06/11] net/smc: ignore loopback-ism when dumping SMC-D devices Wen Gu
2024-04-14  4:03 ` [PATCH net-next v6 07/11] net/smc: register loopback-ism into SMC-D device list Wen Gu
2024-04-25 11:29   ` Wenjia Zhang
2024-04-25 13:29     ` Alexandra Winter
2024-04-26  8:04       ` Wenjia Zhang
2024-04-26  8:09         ` Wen Gu
2024-04-14  4:03 ` [PATCH net-next v6 08/11] net/smc: add operations to merge sndbuf with peer DMB Wen Gu
2024-04-14  4:03 ` [PATCH net-next v6 09/11] net/smc: {at|de}tach sndbuf to peer DMB if supported Wen Gu
2024-04-14  4:03 ` [PATCH net-next v6 10/11] net/smc: adapt cursor update when sndbuf and peer DMB are merged Wen Gu
2024-04-16 11:05   ` Simon Horman
2024-04-16 14:06     ` Wen Gu
2024-04-14  4:03 ` [PATCH net-next v6 11/11] net/smc: implement DMB-merged operations of loopback-ism Wen Gu
2024-04-17 15:36 ` [PATCH net-next v6 00/11] net/smc: SMC intra-OS shortcut with loopback-ism Wenjia Zhang
2024-04-23  6:45   ` Wen Gu
2024-04-25 11:30 ` Wenjia Zhang
2024-04-26  7:02 ` Jan Karcher
2024-04-26  8:17   ` Wen Gu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.