All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Xueming Li <xuemingl@nvidia.com>
To: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Cc: <dev@dpdk.org>, <xuemingl@nvidia.com>,
	Matan Azrad <matan@nvidia.com>,
	Shahaf Shuler <shahafs@nvidia.com>,
	Anatoly Burakov <anatoly.burakov@intel.com>
Subject: [dpdk-dev] [RFC 14/14] net/mlx5: support SubFunction
Date: Thu, 27 May 2021 17:02:02 +0300	[thread overview]
Message-ID: <20210527140202.19377-5-xuemingl@nvidia.com> (raw)
In-Reply-To: <20210527140202.19377-1-xuemingl@nvidia.com>

This patch introduces SF support. Similar to VF, SF on auxiliary bus is
a portion of hardware PF, no representor or bonding parameters for SF.

Devargs to support SF:
-a auxiliary:mlx5_core.sf.8,dv_flow_en=1

New global syntax to support SF:
-a bus=auxiliary,name=mlx5_core.sf.8/class=eth/driver=mlx5,dv_flow_en=1

Signed-off-by: Xueming Li <xuemingl@nvidia.com>
---
 doc/guides/nics/mlx5.rst                | 339 +++++++++++++++++++++++-
 drivers/net/mlx5/linux/mlx5_ethdev_os.c |  12 +-
 drivers/net/mlx5/linux/mlx5_os.c        | 142 +++++++---
 drivers/net/mlx5/linux/mlx5_os.h        |   2 +
 drivers/net/mlx5/mlx5.c                 |  10 +-
 drivers/net/mlx5/mlx5.h                 |   1 +
 drivers/net/mlx5/mlx5_rxmode.c          |   8 +-
 drivers/net/mlx5/mlx5_trigger.c         |   2 +-
 8 files changed, 452 insertions(+), 64 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 83299646dd..3f5692038c 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -403,6 +403,300 @@ Limitations
   - Hairpin between two ports could only manual binding and explicit Tx flow mode. For single port hairpin, all the combinations of auto/manual binding and explicit/implicit Tx flow mode could be supported.
   - Hairpin in switchdev SR-IOV mode is not supported till now.
 
+- Meter:
+
+Limitations
+-----------
+
+- Windows support:
+
+  On Windows, the features are limited:
+
+  - Promiscuous mode is not supported
+  - The following rules are supported:
+
+    - IPv4/UDP with CVLAN filtering
+    - Unicast MAC filtering
+
+- For secondary process:
+
+  - Forked secondary process not supported.
+  - External memory unregistered in EAL memseg list cannot be used for DMA
+    unless such memory has been registered by ``mlx5_mr_update_ext_mp()`` in
+    primary process and remapped to the same virtual address in secondary
+    process. If the external memory is registered by primary process but has
+    different virtual address in secondary process, unexpected error may happen.
+
+- When using Verbs flow engine (``dv_flow_en`` = 0), flow pattern without any
+  specific VLAN will match for VLAN packets as well:
+
+  When VLAN spec is not specified in the pattern, the matching rule will be created with VLAN as a wild card.
+  Meaning, the flow rule::
+
+        flow create 0 ingress pattern eth / vlan vid is 3 / ipv4 / end ...
+
+  Will only match vlan packets with vid=3. and the flow rule::
+
+        flow create 0 ingress pattern eth / ipv4 / end ...
+
+  Will match any ipv4 packet (VLAN included).
+
+- When using Verbs flow engine (``dv_flow_en`` = 0), multi-tagged(QinQ) match is not supported.
+
+- When using DV flow engine (``dv_flow_en`` = 1), flow pattern with any VLAN specification will match only single-tagged packets unless the ETH item ``type`` field is 0x88A8 or the VLAN item ``has_more_vlan`` field is 1.
+  The flow rule::
+
+        flow create 0 ingress pattern eth / ipv4 / end ...
+
+  Will match any ipv4 packet.
+  The flow rules::
+
+        flow create 0 ingress pattern eth / vlan / end ...
+        flow create 0 ingress pattern eth has_vlan is 1 / end ...
+        flow create 0 ingress pattern eth type is 0x8100 / end ...
+
+  Will match single-tagged packets only, with any VLAN ID value.
+  The flow rules::
+
+        flow create 0 ingress pattern eth type is 0x88A8 / end ...
+        flow create 0 ingress pattern eth / vlan has_more_vlan is 1 / end ...
+
+  Will match multi-tagged packets only, with any VLAN ID value.
+
+- A flow pattern with 2 sequential VLAN items is not supported.
+
+- VLAN pop offload command:
+
+  - Flow rules having a VLAN pop offload command as one of their actions and
+    are lacking a match on VLAN as one of their items are not supported.
+  - The command is not supported on egress traffic in NIC mode.
+
+- VLAN push offload is not supported on ingress traffic in NIC mode.
+
+- VLAN set PCP offload is not supported on existing headers.
+
+- A multi segment packet must have not more segments than reported by dev_infos_get()
+  in tx_desc_lim.nb_seg_max field. This value depends on maximal supported Tx descriptor
+  size and ``txq_inline_min`` settings and may be from 2 (worst case forced by maximal
+  inline settings) to 58.
+
+- Flows with a VXLAN Network Identifier equal (or ends to be equal)
+  to 0 are not supported.
+
+- L3 VXLAN and VXLAN-GPE tunnels cannot be supported together with MPLSoGRE and MPLSoUDP.
+
+- Match on Geneve header supports the following fields only:
+
+     - VNI
+     - OAM
+     - protocol type
+     - options length
+
+- Match on Geneve TLV option is supported on the following fields:
+
+     - Class
+     - Type
+     - Length
+     - Data
+
+  Only one Class/Type/Length Geneve TLV option is supported per shared device.
+  Class/Type/Length fields must be specified as well as masks.
+  Class/Type/Length specified masks must be full.
+  Matching Geneve TLV option without specifying data is not supported.
+  Matching Geneve TLV option with ``data & mask == 0`` is not supported.
+
+- VF: flow rules created on VF devices can only match traffic targeted at the
+  configured MAC addresses (see ``rte_eth_dev_mac_addr_add()``).
+
+- Match on GTP tunnel header item supports the following fields only:
+
+     - v_pt_rsv_flags: E flag, S flag, PN flag
+     - msg_type
+     - teid
+
+- Match on GTP extension header only for GTP PDU session container (next
+  extension header type = 0x85).
+- Match on GTP extension header is not supported in group 0.
+
+- No Tx metadata go to the E-Switch steering domain for the Flow group 0.
+  The flows within group 0 and set metadata action are rejected by hardware.
+
+.. note::
+
+   MAC addresses not already present in the bridge table of the associated
+   kernel network device will be added and cleaned up by the PMD when closing
+   the device. In case of ungraceful program termination, some entries may
+   remain present and should be removed manually by other means.
+
+- Buffer split offload is supported with regular Rx burst routine only,
+  no MPRQ feature or vectorized code can be engaged.
+
+- When Multi-Packet Rx queue is configured (``mprq_en``), a Rx packet can be
+  externally attached to a user-provided mbuf with having EXT_ATTACHED_MBUF in
+  ol_flags. As the mempool for the external buffer is managed by PMD, all the
+  Rx mbufs must be freed before the device is closed. Otherwise, the mempool of
+  the external buffers will be freed by PMD and the application which still
+  holds the external buffers may be corrupted.
+
+- If Multi-Packet Rx queue is configured (``mprq_en``) and Rx CQE compression is
+  enabled (``rxq_cqe_comp_en``) at the same time, RSS hash result is not fully
+  supported. Some Rx packets may not have PKT_RX_RSS_HASH.
+
+- IPv6 Multicast messages are not supported on VM, while promiscuous mode
+  and allmulticast mode are both set to off.
+  To receive IPv6 Multicast messages on VM, explicitly set the relevant
+  MAC address using rte_eth_dev_mac_addr_add() API.
+
+- To support a mixed traffic pattern (some buffers from local host memory, some
+  buffers from other devices) with high bandwidth, a mbuf flag is used.
+
+  An application hints the PMD whether or not it should try to inline the
+  given mbuf data buffer. PMD should do the best effort to act upon this request.
+
+  The hint flag ``RTE_PMD_MLX5_FINE_GRANULARITY_INLINE`` is dynamic,
+  registered by application with rte_mbuf_dynflag_register(). This flag is
+  purely driver-specific and declared in PMD specific header ``rte_pmd_mlx5.h``,
+  which is intended to be used by the application.
+
+  To query the supported specific flags in runtime,
+  the function ``rte_pmd_mlx5_get_dyn_flag_names`` returns the array of
+  currently (over present hardware and configuration) supported specific flags.
+  The "not inline hint" feature operating flow is the following one:
+
+    - application starts
+    - probe the devices, ports are created
+    - query the port capabilities
+    - if port supporting the feature is found
+    - register dynamic flag ``RTE_PMD_MLX5_FINE_GRANULARITY_INLINE``
+    - application starts the ports
+    - on ``dev_start()`` PMD checks whether the feature flag is registered and
+      enables the feature support in datapath
+    - application might set the registered flag bit in ``ol_flags`` field
+      of mbuf being sent and PMD will handle ones appropriately.
+
+- The amount of descriptors in Tx queue may be limited by data inline settings.
+  Inline data require the more descriptor building blocks and overall block
+  amount may exceed the hardware supported limits. The application should
+  reduce the requested Tx size or adjust data inline settings with
+  ``txq_inline_max`` and ``txq_inline_mpw`` devargs keys.
+
+- To provide the packet send scheduling on mbuf timestamps the ``tx_pp``
+  parameter should be specified.
+  When PMD sees the RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME set on the packet
+  being sent it tries to synchronize the time of packet appearing on
+  the wire with the specified packet timestamp. It the specified one
+  is in the past it should be ignored, if one is in the distant future
+  it should be capped with some reasonable value (in range of seconds).
+  These specific cases ("too late" and "distant future") can be optionally
+  reported via device xstats to assist applications to detect the
+  time-related problems.
+
+  The timestamp upper "too-distant-future" limit
+  at the moment of invoking the Tx burst routine
+  can be estimated as ``tx_pp`` option (in nanoseconds) multiplied by 2^23.
+  Please note, for the testpmd txonly mode,
+  the limit is deduced from the expression::
+
+        (n_tx_descriptors / burst_size + 1) * inter_burst_gap
+
+  There is no any packet reordering according timestamps is supposed,
+  neither within packet burst, nor between packets, it is an entirely
+  application responsibility to generate packets and its timestamps
+  in desired order. The timestamps can be put only in the first packet
+  in the burst providing the entire burst scheduling.
+
+- E-Switch decapsulation Flow:
+
+  - can be applied to PF port only.
+  - must specify VF port action (packet redirection from PF to VF).
+  - optionally may specify tunnel inner source and destination MAC addresses.
+
+- E-Switch  encapsulation Flow:
+
+  - can be applied to VF ports only.
+  - must specify PF port action (packet redirection from VF to PF).
+
+- Raw encapsulation:
+
+  - The input buffer, used as outer header, is not validated.
+
+- Raw decapsulation:
+
+  - The decapsulation is always done up to the outermost tunnel detected by the HW.
+  - The input buffer, providing the removal size, is not validated.
+  - The buffer size must match the length of the headers to be removed.
+
+- ICMP(code/type/identifier/sequence number) / ICMP6(code/type) matching, IP-in-IP and MPLS flow matching are all
+  mutually exclusive features which cannot be supported together
+  (see :ref:`mlx5_firmware_config`).
+
+- LRO:
+
+  - Requires DevX and DV flow to be enabled.
+  - KEEP_CRC offload cannot be supported with LRO.
+  - The first mbuf length, without head-room,  must be big enough to include the
+    TCP header (122B).
+  - Rx queue with LRO offload enabled, receiving a non-LRO packet, can forward
+    it with size limited to max LRO size, not to max RX packet length.
+  - LRO can be used with outer header of TCP packets of the standard format:
+        eth (with or without vlan) / ipv4 or ipv6 / tcp / payload
+
+    Other TCP packets (e.g. with MPLS label) received on Rx queue with LRO enabled, will be received with bad checksum.
+  - LRO packet aggregation is performed by HW only for packet size larger than
+    ``lro_min_mss_size``. This value is reported on device start, when debug
+    mode is enabled.
+
+- CRC:
+
+  - ``DEV_RX_OFFLOAD_KEEP_CRC`` cannot be supported with decapsulation
+    for some NICs (such as ConnectX-6 Dx, ConnectX-6 Lx, and BlueField-2).
+    The capability bit ``scatter_fcs_w_decap_disable`` shows NIC support.
+
+- TX mbuf fast free:
+
+  - fast free offload assumes the all mbufs being sent are originated from the
+    same memory pool and there is no any extra references to the mbufs (the
+    reference counter for each mbuf is equal 1 on tx_burst call). The latter
+    means there should be no any externally attached buffers in mbufs. It is
+    an application responsibility to provide the correct mbufs if the fast
+    free offload is engaged. The mlx5 PMD implicitly produces the mbufs with
+    externally attached buffers if MPRQ option is enabled, hence, the fast
+    free offload is neither supported nor advertised if there is MPRQ enabled.
+
+- Sample flow:
+
+  - Supports ``RTE_FLOW_ACTION_TYPE_SAMPLE`` action only within NIC Rx and
+    E-Switch steering domain.
+  - For E-Switch Sampling flow with sample ratio > 1, additional actions are not
+    supported in the sample actions list.
+  - For ConnectX-5, the ``RTE_FLOW_ACTION_TYPE_SAMPLE`` is typically used as
+    first action in the E-Switch egress flow if with header modify or
+    encapsulation actions.
+  - For NIC Rx flow, supports ``MARK``, ``COUNT``, ``QUEUE``, ``RSS`` in the
+    sample actions list.
+  - For E-Switch mirroring flow, supports ``RAW ENCAP``, ``Port ID``,
+    ``VXLAN ENCAP``, ``NVGRE ENCAP`` in the sample actions list.
+
+- Modify Field flow:
+
+  - Supports the 'set' operation only for ``RTE_FLOW_ACTION_TYPE_MODIFY_FIELD`` action.
+  - Modification of an arbitrary place in a packet via the special ``RTE_FLOW_FIELD_START`` Field ID is not supported.
+  - Modification of the 802.1Q Tag, VXLAN Network or GENEVE Network ID's is not supported.
+  - Encapsulation levels are not supported, can modify outermost header fields only.
+  - Offsets must be 32-bits aligned, cannot skip past the boundary of a field.
+
+- IPv6 header item 'proto' field, indicating the next header protocol, should
+  not be set as extension header.
+  In case the next header is an extension header, it should not be specified in
+  IPv6 header item 'proto' field.
+  The last extension header item 'next header' field can specify the following
+  header protocol type.
+
+- Hairpin:
+
+  - Hairpin between two ports could only manual binding and explicit Tx flow mode. For single port hairpin, all the combinations of auto/manual binding and explicit/implicit Tx flow mode could be supported.
+  - Hairpin in switchdev SR-IOV mode is not supported till now.
+
 - Meter:
 
   - All the meter colors with drop action will be counted only by the global drop statistics.
@@ -1438,13 +1732,17 @@ the DPDK application.
 
         echo switchdev > /sys/class/net/<net device>/compat/devlink/mode
 
-Sub-Function representor
-------------------------
+SubFunction support
+-------------------
+SubFunction is a portion of the PCI device, a SF netdev has its own
+dedicated queues(txq, rxq). A SF shares PCI level resources with other SFs
+and/or with its parent PCI function.
 
-Sub-Function is a portion of the PCI device, a SF netdev has its own
-dedicated queues(txq, rxq). A SF netdev supports E-Switch representation
-offload similar to existing PF and VF representors. A SF shares PCI
-level resources with other SFs and/or with its parent PCI function.
+0. Requirement::
+
+        kernel version >= 5.12 or OFED version >= 5.6
+
+        iproute2 >= 5.11
 
 1. Configure SF feature::
 
@@ -1457,21 +1755,34 @@ level resources with other SFs and/or with its parent PCI function.
             2: 32 SFs
             3: 64 SFs
 
-2. Reset the FW::
+2. Enable switchdev mode::
 
-        mlxfwreset -d <mst device> reset
+        devlink dev eswitch set pci/<DBDF> mode switchdev
 
-3. Enable switchdev mode::
+3. Add SF port::
 
-        echo switchdev > /sys/class/net/<net device>/compat/devlink/mode
+        devlink port add pci/<DBDF> flavour pcisf pfnum 0 sfnum <sfnum>
+
+        Get SFID from output: pci/<DBDF>/<SFID>
+
+4. Modify MAC address::
+
+        devlink port function set pci/<DBDF>/<SFID> hw_addr <MAC>
+
+5. Activate SF port::
+
+        devlink port function set pci/<DBDF>/<ID> state active
 
-4. Create SF::
+6. Devargs to probe SF device::
 
-        mlnx-sf -d <PCI_BDF> -a create
+        auxiliary:mlx5_core.sf.9,dv_flow_en=1
 
-5. Probe SF representor::
+SubFunction representor support
+-------------------------------
+A SF netdev supports E-Switch representation offload similar to existing PF
+and VF representors. Use <sfnum> to probe SF representor.
 
-        testpmd> port attach <PCI_BDF>,representor=sf0,dv_flow_en=1
+        testpmd> port attach <PCI_BDF>,representor=sf<sfnum>,dv_flow_en=1
 
 Performance tuning
 ------------------
diff --git a/drivers/net/mlx5/linux/mlx5_ethdev_os.c b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
index 6fdb310129..8678502595 100644
--- a/drivers/net/mlx5/linux/mlx5_ethdev_os.c
+++ b/drivers/net/mlx5/linux/mlx5_ethdev_os.c
@@ -128,6 +128,17 @@ struct ethtool_link_settings {
 #define ETHTOOL_LINK_MODE_200000baseCR4_Full_BIT 2 /* 66 - 64 */
 #endif
 
+/* Get interface index from SubFunction device name. */
+int
+mlx5_auxiliary_get_ifindex(const char *sf_name)
+{
+	char if_name[IF_NAMESIZE];
+
+	if (mlx5_auxiliary_get_child_name(sf_name, "/net",
+					  if_name, sizeof(if_name)) != 0)
+		return -rte_errno;
+	return if_nametoindex(if_name);
+}
 
 /**
  * Get interface name from private structure.
@@ -1619,4 +1630,3 @@ mlx5_get_mac(struct rte_eth_dev *dev, uint8_t (*mac)[RTE_ETHER_ADDR_LEN])
 	memcpy(mac, request.ifr_hwaddr.sa_data, RTE_ETHER_ADDR_LEN);
 	return 0;
 }
-
diff --git a/drivers/net/mlx5/linux/mlx5_os.c b/drivers/net/mlx5/linux/mlx5_os.c
index 4f16230fa5..d74273a7ca 100644
--- a/drivers/net/mlx5/linux/mlx5_os.c
+++ b/drivers/net/mlx5/linux/mlx5_os.c
@@ -20,6 +20,7 @@
 #include <ethdev_pci.h>
 #include <rte_pci.h>
 #include <rte_bus_pci.h>
+#include <rte_bus_auxiliary.h>
 #include <rte_common.h>
 #include <rte_kvargs.h>
 #include <rte_rwlock.h>
@@ -1923,6 +1924,27 @@ mlx5_device_bond_pci_match(const struct ibv_device *ibv_dev,
 	return pf;
 }
 
+static void
+mlx5_os_config_default(struct mlx5_dev_config *config)
+{
+	memset(config, 0, sizeof(*config));
+	config->mps = MLX5_ARG_UNSET;
+	config->dbnc = MLX5_ARG_UNSET;
+	config->rx_vec_en = 1;
+	config->txq_inline_max = MLX5_ARG_UNSET;
+	config->txq_inline_min = MLX5_ARG_UNSET;
+	config->txq_inline_mpw = MLX5_ARG_UNSET;
+	config->txqs_inline = MLX5_ARG_UNSET;
+	config->vf_nl_en = 1;
+	config->mr_ext_memseg_en = 1;
+	config->mprq.max_memcpy_len = MLX5_MPRQ_MEMCPY_DEFAULT_LEN;
+	config->mprq.min_rxqs_num = MLX5_MPRQ_MIN_RXQS;
+	config->dv_esw_en = 1;
+	config->dv_flow_en = 1;
+	config->decap_en = 1;
+	config->log_hp_size = MLX5_ARG_UNSET;
+}
+
 /**
  * Register a PCI device within bonding.
  *
@@ -2334,23 +2356,8 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
 		uint32_t restore;
 
 		/* Default configuration. */
-		memset(&dev_config, 0, sizeof(struct mlx5_dev_config));
+		mlx5_os_config_default(&dev_config);
 		dev_config.vf = dev_config_vf;
-		dev_config.mps = MLX5_ARG_UNSET;
-		dev_config.dbnc = MLX5_ARG_UNSET;
-		dev_config.rx_vec_en = 1;
-		dev_config.txq_inline_max = MLX5_ARG_UNSET;
-		dev_config.txq_inline_min = MLX5_ARG_UNSET;
-		dev_config.txq_inline_mpw = MLX5_ARG_UNSET;
-		dev_config.txqs_inline = MLX5_ARG_UNSET;
-		dev_config.vf_nl_en = 1;
-		dev_config.mr_ext_memseg_en = 1;
-		dev_config.mprq.max_memcpy_len = MLX5_MPRQ_MEMCPY_DEFAULT_LEN;
-		dev_config.mprq.min_rxqs_num = MLX5_MPRQ_MIN_RXQS;
-		dev_config.dv_esw_en = 1;
-		dev_config.dv_flow_en = 1;
-		dev_config.decap_en = 1;
-		dev_config.log_hp_size = MLX5_ARG_UNSET;
 		list[i].eth_dev = mlx5_dev_spawn(&pci_dev->device,
 						 &list[i],
 						 &dev_config,
@@ -2407,6 +2414,35 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
 	return ret;
 }
 
+static int
+mlx5_os_parse_eth_devargs(struct rte_device *dev,
+			  struct rte_eth_devargs *eth_da)
+{
+	int ret = 0;
+
+	if (dev->devargs == NULL)
+		return 0;
+	memset(eth_da, 0, sizeof(*eth_da));
+	/* Parse representor information first from class argument. */
+	if (dev->devargs->cls_str)
+		ret = rte_eth_devargs_parse(dev->devargs->cls_str, eth_da);
+	if (ret != 0) {
+		DRV_LOG(ERR, "failed to parse device arguments: %s",
+			dev->devargs->cls_str);
+		return -rte_errno;
+	}
+	if (eth_da->type == RTE_ETH_REPRESENTOR_NONE) {
+		/* Parse legacy device argument */
+		ret = rte_eth_devargs_parse(dev->devargs->args, eth_da);
+		if (ret) {
+			DRV_LOG(ERR, "failed to parse device arguments: %s",
+				dev->devargs->args);
+			return -rte_errno;
+		}
+	}
+	return 0;
+}
+
 /**
  * Callback to register a PCI device.
  *
@@ -2421,31 +2457,13 @@ mlx5_os_pci_probe_pf(struct rte_pci_device *pci_dev,
 static int
 mlx5_os_pci_probe(struct rte_pci_device *pci_dev)
 {
-	struct rte_eth_devargs eth_da = { .type = RTE_ETH_REPRESENTOR_NONE };
+	struct rte_eth_devargs eth_da = { .nb_ports = 0 };
 	int ret = 0;
 	uint16_t p;
 
-	if (pci_dev->device.devargs) {
-		/* Parse representor information from device argument. */
-		if (pci_dev->device.devargs->cls_str)
-			ret = rte_eth_devargs_parse
-				(pci_dev->device.devargs->cls_str, &eth_da);
-		if (ret) {
-			DRV_LOG(ERR, "failed to parse device arguments: %s",
-				pci_dev->device.devargs->cls_str);
-			return -rte_errno;
-		}
-		if (eth_da.type == RTE_ETH_REPRESENTOR_NONE) {
-			/* Support legacy device argument */
-			ret = rte_eth_devargs_parse
-				(pci_dev->device.devargs->args, &eth_da);
-			if (ret) {
-				DRV_LOG(ERR, "failed to parse device arguments: %s",
-					pci_dev->device.devargs->args);
-				return -rte_errno;
-			}
-		}
-	}
+	ret = mlx5_os_parse_eth_devargs(&pci_dev->device, &eth_da);
+	if (ret != 0)
+		return ret;
 
 	if (eth_da.nb_ports > 0) {
 		/* Iterate all port if devargs pf is range: "pf[0-1]vf[...]". */
@@ -2458,10 +2476,53 @@ mlx5_os_pci_probe(struct rte_pci_device *pci_dev)
 	return ret;
 }
 
+/* Probe a single SF device on auxiliary bus, no representor support. */
+static int
+mlx5_os_auxiliary_probe(struct rte_device *dev)
+{
+	struct rte_eth_devargs eth_da = { .nb_ports = 0 };
+	struct mlx5_dev_config config;
+	struct mlx5_dev_spawn_data spawn = { .pf_bond = -1 };
+	struct rte_auxiliary_device *adev = RTE_DEV_TO_AUXILIARY(dev);
+	struct rte_eth_dev *eth_dev;
+	int ret = 0;
+
+	/* Parse ethdev devargs. */
+	ret = mlx5_os_parse_eth_devargs(dev, &eth_da);
+	if (ret != 0)
+		return ret;
+	/* Set default config data. */
+	mlx5_os_config_default(&config);
+	config.sf = 1;
+	/* Init spawn data. */
+	spawn.max_port = 1;
+	spawn.phys_port = 1;
+	spawn.phys_dev = mlx5_get_ibv_device(dev);
+	ret = mlx5_auxiliary_get_ifindex(dev->name);
+	if (ret < 0) {
+		DRV_LOG(ERR, "failed to get ethdev ifindex: %s", dev->name);
+		return ret;
+	}
+	spawn.ifindex = ret;
+	/* Spawn device. */
+	eth_dev = mlx5_dev_spawn(dev, &spawn, &config, &eth_da);
+	if (eth_dev == NULL)
+		return -rte_errno;
+	/* Post create. */
+	eth_dev->intr_handle = &adev->intr_handle;
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC;
+		eth_dev->data->dev_flags |= RTE_ETH_DEV_INTR_RMV;
+		eth_dev->data->numa_node = dev->numa_node;
+	}
+	rte_eth_dev_probing_finish(eth_dev);
+	return 0;
+}
+
 /**
  * Common bus driver callback to probe a device.
  *
- * This function probe PCI bus device(s).
+ * This function probe PCI bus device(s) or a single SF on auxiliary bus.
  *
  * @param[in] dev
  *   Pointer to the generic device.
@@ -2484,7 +2545,8 @@ mlx5_os_net_probe(struct rte_device *dev)
 	}
 	if (mlx5_dev_is_pci(dev))
 		return mlx5_os_pci_probe(RTE_DEV_TO_PCI(dev));
-	return 0;
+	else
+		return mlx5_os_auxiliary_probe(dev);
 }
 
 static int
diff --git a/drivers/net/mlx5/linux/mlx5_os.h b/drivers/net/mlx5/linux/mlx5_os.h
index af7cbeb418..2991d37df2 100644
--- a/drivers/net/mlx5/linux/mlx5_os.h
+++ b/drivers/net/mlx5/linux/mlx5_os.h
@@ -19,4 +19,6 @@ enum {
 
 #define MLX5_NAMESIZE IF_NAMESIZE
 
+int mlx5_auxiliary_get_ifindex(const char *sf_name);
+
 #endif /* RTE_PMD_MLX5_OS_H_ */
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 3defdb2db3..69edd55b86 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2319,10 +2319,12 @@ mlx5_eth_find_next(uint16_t port_id, struct rte_eth_dev *odev)
 			if (opriv->sh == priv->sh ||
 			    odev->device == dev->device)
 				break;
-		} else if (dev->device != NULL && dev->device->driver &&
-			dev->device->driver->name &&
-			!strcmp(dev->device->driver->name,
-				MLX5_PCI_DRIVER_NAME)) {
+		} else if (dev->device != NULL && dev->device->driver != NULL &&
+			dev->device->driver->name != NULL &&
+			(strcmp(dev->device->driver->name,
+				MLX5_PCI_DRIVER_NAME) == 0 ||
+			 strcmp(dev->device->driver->name,
+				MLX5_AUXILIARY_DRIVER_NAME) == 0)) {
 			/* odev not specified, found all mlx5 devices. */
 			break;
 		}
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 27bb34e827..b06f45fc54 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -220,6 +220,7 @@ struct mlx5_dev_config {
 	unsigned int hw_fcs_strip:1; /* FCS stripping is supported. */
 	unsigned int hw_padding:1; /* End alignment padding is supported. */
 	unsigned int vf:1; /* This is a VF. */
+	unsigned int sf:1; /* This is a SF. */
 	unsigned int tunnel_en:1;
 	/* Whether tunnel stateless offloads are supported. */
 	unsigned int mpls_en:1; /* MPLS over GRE/UDP is enabled. */
diff --git a/drivers/net/mlx5/mlx5_rxmode.c b/drivers/net/mlx5/mlx5_rxmode.c
index 25fb47c9ed..7f19b235c2 100644
--- a/drivers/net/mlx5/mlx5_rxmode.c
+++ b/drivers/net/mlx5/mlx5_rxmode.c
@@ -36,7 +36,7 @@ mlx5_promiscuous_enable(struct rte_eth_dev *dev)
 			dev->data->port_id);
 		return 0;
 	}
-	if (priv->config.vf) {
+	if (priv->config.vf || priv->config.sf) {
 		ret = mlx5_os_set_promisc(dev, 1);
 		if (ret)
 			return ret;
@@ -69,7 +69,7 @@ mlx5_promiscuous_disable(struct rte_eth_dev *dev)
 	int ret;
 
 	dev->data->promiscuous = 0;
-	if (priv->config.vf) {
+	if (priv->config.vf || priv->config.sf) {
 		ret = mlx5_os_set_promisc(dev, 0);
 		if (ret)
 			return ret;
@@ -109,7 +109,7 @@ mlx5_allmulticast_enable(struct rte_eth_dev *dev)
 			dev->data->port_id);
 		return 0;
 	}
-	if (priv->config.vf) {
+	if (priv->config.vf || priv->config.sf) {
 		ret = mlx5_os_set_allmulti(dev, 1);
 		if (ret)
 			goto error;
@@ -142,7 +142,7 @@ mlx5_allmulticast_disable(struct rte_eth_dev *dev)
 	int ret;
 
 	dev->data->all_multicast = 0;
-	if (priv->config.vf) {
+	if (priv->config.vf || priv->config.sf) {
 		ret = mlx5_os_set_allmulti(dev, 0);
 		if (ret)
 			goto error;
diff --git a/drivers/net/mlx5/mlx5_trigger.c b/drivers/net/mlx5/mlx5_trigger.c
index 6c8a64ce03..e4e057a6f8 100644
--- a/drivers/net/mlx5/mlx5_trigger.c
+++ b/drivers/net/mlx5/mlx5_trigger.c
@@ -1259,7 +1259,7 @@ mlx5_traffic_enable(struct rte_eth_dev *dev)
 		}
 		mlx5_txq_release(dev, i);
 	}
-	if (priv->config.dv_esw_en && !priv->config.vf) {
+	if (priv->config.dv_esw_en && !priv->config.vf && !priv->config.sf) {
 		if (mlx5_flow_create_esw_table_zero_flow(dev))
 			priv->fdb_def_rule = 1;
 		else
-- 
2.25.1


  parent reply	other threads:[~2021-05-27 14:03 UTC|newest]

Thread overview: 108+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-27 13:37 [dpdk-dev] [RFC 00/14] mlx5: support SubFunction Xueming Li
2021-05-27 13:37 ` [dpdk-dev] [RFC 01/14] common/mlx5: add common device driver Xueming Li
2021-06-10  9:51   ` Thomas Monjalon
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 00/14] net/mlx5: support Sub-Function Xueming Li
2021-07-21 14:37     ` [dpdk-dev] [PATCH v4 00/16] " Xueming Li
2021-07-21 14:37       ` [dpdk-dev] [PATCH v4 01/16] common/mlx5: rename eth device class name Xueming Li
2021-07-21 14:37       ` [dpdk-dev] [PATCH v4 02/16] common/mlx5: add common device driver Xueming Li
2021-07-21 14:37       ` [dpdk-dev] [PATCH v4 03/16] common/mlx5: move description of PCI sysfs functions Xueming Li
2021-07-21 14:37       ` [dpdk-dev] [PATCH v4 04/16] common/mlx5: support auxiliary bus Xueming Li
2021-07-21 14:37       ` [dpdk-dev] [PATCH v4 05/16] common/mlx5: get PCI device address from any bus Xueming Li
2021-07-21 14:37       ` [dpdk-dev] [PATCH v4 06/16] net/mlx5: remove PCI dependency Xueming Li
2021-07-21 14:37       ` [dpdk-dev] [PATCH v4 07/16] net/mlx5: migrate to bus-agnostic common driver Xueming Li
2021-07-21 14:37       ` [dpdk-dev] [PATCH v4 08/16] net/mlx5: support SubFunction Xueming Li
2021-07-21 14:37       ` [dpdk-dev] [PATCH v4 09/16] net/mlx5: check max Verbs port number Xueming Li
2021-07-21 14:37       ` [dpdk-dev] [PATCH v4 10/16] regex/mlx5: migrate to common driver Xueming Li
2021-07-21 14:37       ` [dpdk-dev] [PATCH v4 11/16] vdpa/mlx5: define driver name as macro Xueming Li
2021-07-21 14:37       ` [dpdk-dev] [PATCH v4 12/16] vdpa/mlx5: remove PCI specifics Xueming Li
2021-07-21 14:37       ` [dpdk-dev] [PATCH v4 13/16] vdpa/mlx5: support SubFunction Xueming Li
2021-07-21 14:37       ` [dpdk-dev] [PATCH v4 14/16] compress/mlx5: migrate to common driver Xueming Li
2021-07-21 14:37       ` [dpdk-dev] [PATCH v4 15/16] crypto/mlx5: " Xueming Li
2021-07-21 14:37       ` [dpdk-dev] [PATCH v4 16/16] common/mlx5: clean up legacy PCI bus driver Xueming Li
2021-07-21 22:24       ` [dpdk-dev] [PATCH v4 00/16] net/mlx5: support Sub-Function Thomas Monjalon
2021-07-22  3:03         ` Xueming(Steven) Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 01/14] common/mlx5: add common device driver Xueming Li
2021-07-13 13:14     ` [dpdk-dev] [PATCH v2 00/14] net/mlx5: support Sub-Function Xueming Li
2021-07-13 13:14     ` [dpdk-dev] [PATCH v2 01/14] common/mlx5: add common device driver Xueming Li
2021-07-14  5:58       ` Slava Ovsiienko
2021-07-18 18:28       ` Thomas Monjalon
2021-07-19  4:05         ` Xueming(Steven) Li
2021-07-19  2:53       ` [dpdk-dev] [PATCH v3 00/15] net/mlx5: support Sub-Function Xueming Li
2021-07-19  2:53       ` [dpdk-dev] [PATCH v3 01/15] common/mlx5: rename eth device class name Xueming Li
2021-07-19  2:53       ` [dpdk-dev] [PATCH v3 02/15] common/mlx5: add common device driver Xueming Li
2021-07-19  2:53       ` [dpdk-dev] [PATCH v3 03/15] common/mlx5: move description of PCI sysfs functions Xueming Li
2021-07-19  2:53       ` [dpdk-dev] [PATCH v3 04/15] common/mlx5: support auxiliary bus Xueming Li
2021-07-19  2:54       ` [dpdk-dev] [PATCH v3 05/15] common/mlx5: get PCI device address from any bus Xueming Li
2021-07-19  2:54       ` [dpdk-dev] [PATCH v3 06/15] net/mlx5: remove PCI dependency Xueming Li
2021-07-19  2:54       ` [dpdk-dev] [PATCH v3 07/15] net/mlx5: migrate to bus-agnostic common driver Xueming Li
2021-07-19  2:54       ` [dpdk-dev] [PATCH v3 08/15] net/mlx5: support SubFunction Xueming Li
2021-07-19  2:54       ` [dpdk-dev] [PATCH v3 09/15] net/mlx5: check max Verbs port number Xueming Li
2021-07-19  2:54       ` [dpdk-dev] [PATCH v3 10/15] regex/mlx5: migrate to common driver Xueming Li
2021-07-19  2:54       ` [dpdk-dev] [PATCH v3 11/15] vdpa/mlx5: define driver name as macro Xueming Li
2021-07-19  2:54       ` [dpdk-dev] [PATCH v3 12/15] vdpa/mlx5: remove PCI specifics Xueming Li
2021-07-19  2:54       ` [dpdk-dev] [PATCH v3 13/15] vdpa/mlx5: support SubFunction Xueming Li
2021-07-19  2:54       ` [dpdk-dev] [PATCH v3 14/15] compress/mlx5: migrate to common driver Xueming Li
2021-07-19  2:54       ` [dpdk-dev] [PATCH v3 15/15] common/mlx5: clean up legacy PCI bus driver Xueming Li
2021-07-13 13:14     ` [dpdk-dev] [PATCH v2 02/14] common/mlx5: move description of PCI sysfs functions Xueming Li
2021-07-14  5:58       ` Slava Ovsiienko
2021-07-13 13:14     ` [dpdk-dev] [PATCH v2 03/14] common/mlx5: support auxiliary bus Xueming Li
2021-07-14  5:58       ` Slava Ovsiienko
2021-07-13 13:14     ` [dpdk-dev] [PATCH v2 04/14] common/mlx5: get PCI device address from any bus Xueming Li
2021-07-14  5:59       ` Slava Ovsiienko
2021-07-13 13:14     ` [dpdk-dev] [PATCH v2 05/14] net/mlx5: remove PCI dependency Xueming Li
2021-07-14  5:59       ` Slava Ovsiienko
2021-07-13 13:14     ` [dpdk-dev] [PATCH v2 06/14] net/mlx5: migrate to bus-agnostic common driver Xueming Li
2021-07-14  5:59       ` Slava Ovsiienko
2021-07-13 13:14     ` [dpdk-dev] [PATCH v2 07/14] net/mlx5: support SubFunction Xueming Li
2021-07-14  5:59       ` Slava Ovsiienko
2021-07-13 13:14     ` [dpdk-dev] [PATCH v2 08/14] net/mlx5: check max Verbs port number Xueming Li
2021-07-14  6:00       ` Slava Ovsiienko
2021-07-13 13:14     ` [dpdk-dev] [PATCH v2 09/14] regex/mlx5: migrate to common driver Xueming Li
2021-07-14  6:00       ` Slava Ovsiienko
2021-07-13 13:14     ` [dpdk-dev] [PATCH v2 10/14] vdpa/mlx5: define driver name as macro Xueming Li
2021-07-14  6:00       ` Slava Ovsiienko
2021-07-13 13:14     ` [dpdk-dev] [PATCH v2 11/14] vdpa/mlx5: remove PCI specifics Xueming Li
2021-07-14  6:08       ` Slava Ovsiienko
2021-07-13 13:14     ` [dpdk-dev] [PATCH v2 12/14] vdpa/mlx5: support SubFunction Xueming Li
2021-07-14  6:01       ` Slava Ovsiienko
2021-07-13 13:14     ` [dpdk-dev] [PATCH v2 13/14] compress/mlx5: migrate to common driver Xueming Li
2021-07-14  6:01       ` Slava Ovsiienko
2021-07-13 13:14     ` [dpdk-dev] [PATCH v2 14/14] common/mlx5: clean up legacy PCI bus driver Xueming Li
2021-07-14  6:01       ` Slava Ovsiienko
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 02/14] common/mlx5: move description of PCI sysfs functions Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 03/14] common/mlx5: support auxiliary bus Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 04/14] common/mlx5: get PCI device address from any bus Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 05/14] net/mlx5: remove PCI dependency Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 06/14] net/mlx5: migrate to bus-agnostic common driver Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 07/14] net/mlx5: support SubFunction Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 08/14] net/mlx5: check max Verbs port number Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 09/14] regex/mlx5: migrate to common driver Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 10/14] vdpa/mlx5: define driver name as macro Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 11/14] vdpa/mlx5: remove PCI specifics Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 12/14] vdpa/mlx5: support SubFunction Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 13/14] compress/mlx5: migrate to common driver Xueming Li
2021-06-16  4:09   ` [dpdk-dev] [PATCH v1 14/14] common/mlx5: clean up legacy PCI bus driver Xueming Li
2021-05-27 13:37 ` [dpdk-dev] [RFC 02/14] common/mlx5: move description of PCI sysfs functions Xueming Li
2021-05-27 13:37 ` [dpdk-dev] [RFC 03/14] net/mlx5: remove PCI dependency Xueming Li
2021-05-27 13:37 ` [dpdk-dev] [RFC 04/14] net/mlx5: migrate to bus-agnostic common driver Xueming Li
2021-05-27 13:37 ` [dpdk-dev] [RFC 05/14] regex/mlx5: migrate to " Xueming Li
2021-05-27 13:37 ` [dpdk-dev] [RFC 06/14] compress/mlx5: " Xueming Li
2021-05-27 13:37 ` [dpdk-dev] [RFC 07/14] vdpa/mlx5: fix driver name Xueming Li
2021-05-27 13:37 ` [dpdk-dev] [RFC 08/14] vdpa/mlx5: remove PCI specifics Xueming Li
2021-05-27 13:37 ` [dpdk-dev] [RFC 09/14] common/mlx5: clean up legacy PCI bus driver Xueming Li
2021-05-27 14:01 ` [dpdk-dev] [RFC 10/14] bus/auxiliary: introduce auxiliary bus Xueming Li
2021-05-27 14:01   ` [dpdk-dev] [RFC 11/14] common/mlx5: support " Xueming Li
2021-05-27 14:02   ` [dpdk-dev] [RFC 12/14] common/mlx5: get PCI device address from any bus Xueming Li
2021-05-27 14:02   ` [dpdk-dev] [RFC 13/14] vdpa/mlx5: support SubFunction Xueming Li
2021-05-27 14:02   ` Xueming Li [this message]
2021-06-10 10:33 ` [dpdk-dev] [RFC 00/14] mlx5: " Ferruh Yigit
2021-06-10 13:23   ` Thomas Monjalon
2021-06-11  5:14     ` Xia, Chenbo
2021-06-11  7:54       ` Thomas Monjalon
2021-06-15  2:10         ` Xia, Chenbo
2021-06-15  4:04           ` Parav Pandit
2021-06-15  5:33             ` Xia, Chenbo
2021-06-15  5:43               ` Parav Pandit
2021-06-15 11:19                 ` Xia, Chenbo
2021-06-15 12:47                   ` Parav Pandit
2021-06-15 15:19                     ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210527140202.19377-5-xuemingl@nvidia.com \
    --to=xuemingl@nvidia.com \
    --cc=anatoly.burakov@intel.com \
    --cc=dev@dpdk.org \
    --cc=matan@nvidia.com \
    --cc=shahafs@nvidia.com \
    --cc=viacheslavo@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.