[PATCH vfio 0/9] Add chunk mode support for mlx5 driver

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH vfio 0/9] Add chunk mode support for mlx5 driver
@ 2023-09-11  9:38 Yishai Hadas
  2023-09-11  9:38 ` [PATCH vfio 1/9] net/mlx5: Introduce ifc bits for migration in a chunk mode Yishai Hadas
                   ` (10 more replies)
  0 siblings, 11 replies; 20+ messages in thread
From: Yishai Hadas @ 2023-09-11  9:38 UTC (permalink / raw
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg

This series adds 'chunk mode' support for mlx5 driver upon the migration
flow.

Before this series, we were limited to 4GB state size, as of the 4 bytes
max value based on the device specification for the query/save/load
commands.

Once the device supports 'chunk mode' the driver can support state size
which is larger than 4GB.

In that case, the device has the capability to split a single image to
multiple chunks as long as the software provides a buffer in the minimum
size reported by the device.

The driver should query for the minimum buffer size required using
QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
input, in that case, the output will include both the minimum buffer
size and also the remaining total size to be reported/used where it will
be applicable.

Upon chunk mode, there may be multiple images that will be read from the
device upon STOP_COPY. The driver will read ahead from the firmware the
full state in small/optimized chunks while letting QEMU/user space read
in parallel the available data.

The chunk buffer size is picked up based on the minimum size that
firmware requires, the total full size and some max value in the driver
code which was set to 8MB to achieve some optimized downtime in the
general case.

With that series in place, we could migrate successfully a device state
with a larger size than 4GB, while even improving the downtime in some
scenarios.

Note:
As the first patch should go to net/mlx5 we may need to send it as a
pull request format to VFIO to avoid conflicts before acceptance.

Yishai

Yishai Hadas (9):
  net/mlx5: Introduce ifc bits for migration in a chunk mode
  vfio/mlx5: Wake up the reader post of disabling the SAVING migration
    file
  vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
    error
  vfio/mlx5: Enable querying state size which is > 4GB
  vfio/mlx5: Rename some stuff to match chunk mode
  vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
  vfio/mlx5: Add support for SAVING in chunk mode
  vfio/mlx5: Add support for READING in chunk mode
  vfio/mlx5: Activate the chunk mode functionality

 drivers/vfio/pci/mlx5/cmd.c   | 103 +++++++++----
 drivers/vfio/pci/mlx5/cmd.h   |  28 +++-
 drivers/vfio/pci/mlx5/main.c  | 283 +++++++++++++++++++++++++---------
 include/linux/mlx5/mlx5_ifc.h |  15 +-
 4 files changed, 322 insertions(+), 107 deletions(-)

-- 
2.18.1

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH vfio 1/9] net/mlx5: Introduce ifc bits for migration in a chunk mode
  2023-09-11  9:38 [PATCH vfio 0/9] Add chunk mode support for mlx5 driver Yishai Hadas
@ 2023-09-11  9:38 ` Yishai Hadas
  2023-09-11  9:38 ` [PATCH vfio 2/9] vfio/mlx5: Wake up the reader post of disabling the SAVING migration file Yishai Hadas
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Yishai Hadas @ 2023-09-11  9:38 UTC (permalink / raw
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg

Introduce ifc related stuff to enable migration in a chunk mode.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 include/linux/mlx5/mlx5_ifc.h | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index fc3db401f8a2..3265bfcb3156 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1948,7 +1948,9 @@ struct mlx5_ifc_cmd_hca_cap_2_bits {
 	u8	   reserved_at_c0[0x8];
 	u8	   migration_multi_load[0x1];
 	u8	   migration_tracking_state[0x1];
-	u8	   reserved_at_ca[0x16];
+	u8	   reserved_at_ca[0x6];
+	u8	   migration_in_chunks[0x1];
+	u8	   reserved_at_d1[0xf];
 
 	u8	   reserved_at_e0[0xc0];
 
@@ -12392,7 +12394,8 @@ struct mlx5_ifc_query_vhca_migration_state_in_bits {
 	u8         op_mod[0x10];
 
 	u8         incremental[0x1];
-	u8         reserved_at_41[0xf];
+	u8         chunk[0x1];
+	u8         reserved_at_42[0xe];
 	u8         vhca_id[0x10];
 
 	u8         reserved_at_60[0x20];
@@ -12408,7 +12411,11 @@ struct mlx5_ifc_query_vhca_migration_state_out_bits {
 
 	u8         required_umem_size[0x20];
 
-	u8         reserved_at_a0[0x160];
+	u8         reserved_at_a0[0x20];
+
+	u8         remaining_total_size[0x40];
+
+	u8         reserved_at_100[0x100];
 };
 
 struct mlx5_ifc_save_vhca_state_in_bits {
@@ -12440,7 +12447,7 @@ struct mlx5_ifc_save_vhca_state_out_bits {
 
 	u8         actual_image_size[0x20];
 
-	u8         reserved_at_60[0x20];
+	u8         next_required_umem_size[0x20];
 };
 
 struct mlx5_ifc_load_vhca_state_in_bits {
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH vfio 2/9] vfio/mlx5: Wake up the reader post of disabling the SAVING migration file
  2023-09-11  9:38 [PATCH vfio 0/9] Add chunk mode support for mlx5 driver Yishai Hadas
  2023-09-11  9:38 ` [PATCH vfio 1/9] net/mlx5: Introduce ifc bits for migration in a chunk mode Yishai Hadas
@ 2023-09-11  9:38 ` Yishai Hadas
  2023-09-11  9:38 ` [PATCH vfio 3/9] vfio/mlx5: Refactor the SAVE callback to activate a work only upon an error Yishai Hadas
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Yishai Hadas @ 2023-09-11  9:38 UTC (permalink / raw
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg

Post of disabling the SAVING migration file, which includes setting the
file state to be MLX5_MIGF_STATE_ERROR, call to wake_up_interruptible()
on its poll_wait member.

This lets any potential reader which is waiting already for data as part
of mlx5vf_save_read() to wake up, recognize the error state and return
with an error.

Post of that we don't need to rely on any other condition to wake up
the reader as of the returning of the SAVE command that was previously
executed, etc.

In addition, this change will simplify error flows (e.g health recovery)
once we'll move to chunk mode and multiple SAVE commands may run in the
STOP_COPY phase as we won't need to rely any more on a SAVE command to
wake-up a potential waiting reader.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/mlx5/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index 42ec574a8622..2556d5455692 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -1019,6 +1019,7 @@ void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev)
 		mlx5_cmd_cleanup_async_ctx(&mvdev->saving_migf->async_ctx);
 		cancel_work_sync(&mvdev->saving_migf->async_data.work);
 		mlx5vf_disable_fd(mvdev->saving_migf);
+		wake_up_interruptible(&mvdev->saving_migf->poll_wait);
 		mlx5fv_cmd_clean_migf_resources(mvdev->saving_migf);
 		fput(mvdev->saving_migf->filp);
 		mvdev->saving_migf = NULL;
-- 
2.18.1

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH vfio 3/9] vfio/mlx5: Refactor the SAVE callback to activate a work only upon an error
  2023-09-11  9:38 [PATCH vfio 0/9] Add chunk mode support for mlx5 driver Yishai Hadas
  2023-09-11  9:38 ` [PATCH vfio 1/9] net/mlx5: Introduce ifc bits for migration in a chunk mode Yishai Hadas
  2023-09-11  9:38 ` [PATCH vfio 2/9] vfio/mlx5: Wake up the reader post of disabling the SAVING migration file Yishai Hadas
@ 2023-09-11  9:38 ` Yishai Hadas
  2023-09-11  9:38 ` [PATCH vfio 4/9] vfio/mlx5: Enable querying state size which is > 4GB Yishai Hadas
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Yishai Hadas @ 2023-09-11  9:38 UTC (permalink / raw
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg

Upon a successful SAVE callback there is no need to activate a work, all
the required stuff can be done directly.

As so, refactor the above flow to activate a work only upon an error.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/mlx5/cmd.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index 33574b04477d..18d9d1768066 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -475,6 +475,15 @@ mlx5vf_get_data_buffer(struct mlx5_vf_migration_file *migf,
 	return buf;
 }
 
+static void
+mlx5vf_save_callback_complete(struct mlx5_vf_migration_file *migf,
+			      struct mlx5vf_async_data *async_data)
+{
+	kvfree(async_data->out);
+	complete(&migf->save_comp);
+	fput(migf->filp);
+}
+
 void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work)
 {
 	struct mlx5vf_async_data *async_data = container_of(_work,
@@ -494,9 +503,7 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work)
 		wake_up_interruptible(&migf->poll_wait);
 	}
 	mutex_unlock(&migf->lock);
-	kvfree(async_data->out);
-	complete(&migf->save_comp);
-	fput(migf->filp);
+	mlx5vf_save_callback_complete(migf, async_data);
 }
 
 static int add_buf_header(struct mlx5_vhca_data_buffer *header_buf,
@@ -560,13 +567,12 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context)
 		migf->state = async_data->last_chunk ?
 			MLX5_MIGF_STATE_COMPLETE : MLX5_MIGF_STATE_PRE_COPY;
 		wake_up_interruptible(&migf->poll_wait);
+		mlx5vf_save_callback_complete(migf, async_data);
+		return;
 	}
 
 err:
-	/*
-	 * The error and the cleanup flows can't run from an
-	 * interrupt context
-	 */
+	/* The error flow can't run from an interrupt context */
 	if (status == -EREMOTEIO)
 		status = MLX5_GET(save_vhca_state_out, async_data->out, status);
 	async_data->status = status;
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH vfio 4/9] vfio/mlx5: Enable querying state size which is > 4GB
  2023-09-11  9:38 [PATCH vfio 0/9] Add chunk mode support for mlx5 driver Yishai Hadas
                   ` (2 preceding siblings ...)
  2023-09-11  9:38 ` [PATCH vfio 3/9] vfio/mlx5: Refactor the SAVE callback to activate a work only upon an error Yishai Hadas
@ 2023-09-11  9:38 ` Yishai Hadas
  2023-09-11  9:38 ` [PATCH vfio 5/9] vfio/mlx5: Rename some stuff to match chunk mode Yishai Hadas
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Yishai Hadas @ 2023-09-11  9:38 UTC (permalink / raw
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg

Once the device supports 'chunk mode' the driver can support state size
which is larger than 4GB.

In that case the device has the capability to split a single image to
multiple chunks as long as the software provides a buffer in the minimum
size reported by the device.

The driver should query for the minimum buffer size required using
QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
input, in that case, the output will include both the minimum buffer
size (i.e.  required_umem_size) and also the remaining total size to be
reported/used where that it will be applicable.

At that point in the series the 'chunk' bit is off, the last patch will
activate the feature once all pieces will be ready.

Note:
Before this change we were limited to 4GB state size as of 4 bytes max
value based on the device specification for the query/save/load
commands.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/mlx5/cmd.c  |  9 ++++++++-
 drivers/vfio/pci/mlx5/cmd.h  |  4 +++-
 drivers/vfio/pci/mlx5/main.c | 13 +++++++------
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index 18d9d1768066..e70d84bf2043 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -86,7 +86,8 @@ int mlx5vf_cmd_resume_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod)
 }
 
 int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev,
-					  size_t *state_size, u8 query_flags)
+					  size_t *state_size, u64 *total_size,
+					  u8 query_flags)
 {
 	u32 out[MLX5_ST_SZ_DW(query_vhca_migration_state_out)] = {};
 	u32 in[MLX5_ST_SZ_DW(query_vhca_migration_state_in)] = {};
@@ -128,6 +129,7 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev,
 	MLX5_SET(query_vhca_migration_state_in, in, op_mod, 0);
 	MLX5_SET(query_vhca_migration_state_in, in, incremental,
 		 query_flags & MLX5VF_QUERY_INC);
+	MLX5_SET(query_vhca_migration_state_in, in, chunk, mvdev->chunk_mode);
 
 	ret = mlx5_cmd_exec_inout(mvdev->mdev, query_vhca_migration_state, in,
 				  out);
@@ -139,6 +141,11 @@ int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev,
 
 	*state_size = MLX5_GET(query_vhca_migration_state_out, out,
 			       required_umem_size);
+	if (total_size)
+		*total_size = mvdev->chunk_mode ?
+			MLX5_GET64(query_vhca_migration_state_out, out,
+				   remaining_total_size) : *state_size;
+
 	return 0;
 }
 
diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
index aec4c69dd6c1..4fb37598c8e5 100644
--- a/drivers/vfio/pci/mlx5/cmd.h
+++ b/drivers/vfio/pci/mlx5/cmd.h
@@ -164,6 +164,7 @@ struct mlx5vf_pci_core_device {
 	u8 deferred_reset:1;
 	u8 mdev_detach:1;
 	u8 log_active:1;
+	u8 chunk_mode:1;
 	struct completion tracker_comp;
 	/* protect migration state */
 	struct mutex state_mutex;
@@ -186,7 +187,8 @@ enum {
 int mlx5vf_cmd_suspend_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod);
 int mlx5vf_cmd_resume_vhca(struct mlx5vf_pci_core_device *mvdev, u16 op_mod);
 int mlx5vf_cmd_query_vhca_migration_state(struct mlx5vf_pci_core_device *mvdev,
-					  size_t *state_size, u8 query_flags);
+					  size_t *state_size, u64 *total_size,
+					  u8 query_flags);
 void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev,
 			       const struct vfio_migration_ops *mig_ops,
 			       const struct vfio_log_ops *log_ops);
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index 2556d5455692..90cb36fee6c0 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -428,7 +428,7 @@ static long mlx5vf_precopy_ioctl(struct file *filp, unsigned int cmd,
 		 * As so, the other code below is safe with the proper locks.
 		 */
 		ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &inc_length,
-							    MLX5VF_QUERY_INC);
+							    NULL, MLX5VF_QUERY_INC);
 		if (ret)
 			goto err_state_unlock;
 	}
@@ -505,7 +505,7 @@ static int mlx5vf_pci_save_device_inc_data(struct mlx5vf_pci_core_device *mvdev)
 	if (migf->state == MLX5_MIGF_STATE_ERROR)
 		return -ENODEV;
 
-	ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length,
+	ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, NULL,
 				MLX5VF_QUERY_INC | MLX5VF_QUERY_FINAL);
 	if (ret)
 		goto err;
@@ -574,7 +574,7 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev, bool track)
 	INIT_LIST_HEAD(&migf->buf_list);
 	INIT_LIST_HEAD(&migf->avail_list);
 	spin_lock_init(&migf->list_lock);
-	ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, 0);
+	ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, NULL, 0);
 	if (ret)
 		goto out_pd;
 
@@ -1195,13 +1195,14 @@ static int mlx5vf_pci_get_data_size(struct vfio_device *vdev,
 	struct mlx5vf_pci_core_device *mvdev = container_of(
 		vdev, struct mlx5vf_pci_core_device, core_device.vdev);
 	size_t state_size;
+	u64 total_size;
 	int ret;
 
 	mutex_lock(&mvdev->state_mutex);
-	ret = mlx5vf_cmd_query_vhca_migration_state(mvdev,
-						    &state_size, 0);
+	ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &state_size,
+						    &total_size, 0);
 	if (!ret)
-		*stop_copy_length = state_size;
+		*stop_copy_length = total_size;
 	mlx5vf_state_mutex_unlock(mvdev);
 	return ret;
 }
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH vfio 5/9] vfio/mlx5: Rename some stuff to match chunk mode
  2023-09-11  9:38 [PATCH vfio 0/9] Add chunk mode support for mlx5 driver Yishai Hadas
                   ` (3 preceding siblings ...)
  2023-09-11  9:38 ` [PATCH vfio 4/9] vfio/mlx5: Enable querying state size which is > 4GB Yishai Hadas
@ 2023-09-11  9:38 ` Yishai Hadas
  2023-09-11  9:38 ` [PATCH vfio 6/9] vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase Yishai Hadas
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Yishai Hadas @ 2023-09-11  9:38 UTC (permalink / raw
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg

Upon chunk mode there may be multiple images that will be read from the
device upon STOP_COPY.

This patch is some preparation for that mode by replacing the relevant
stuff to a better matching name.

As part of that, be stricter to recognize PRE_COPY error only when it
didn't occur on a STOP_COPY chunk.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/mlx5/cmd.c | 15 ++++++++-------
 drivers/vfio/pci/mlx5/cmd.h |  4 ++--
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index e70d84bf2043..7b48a9b80bc6 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -503,7 +503,8 @@ void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work)
 		mlx5vf_put_data_buffer(async_data->buf);
 		if (async_data->header_buf)
 			mlx5vf_put_data_buffer(async_data->header_buf);
-		if (async_data->status == MLX5_CMD_STAT_BAD_RES_STATE_ERR)
+		if (!async_data->stop_copy_chunk &&
+		    async_data->status == MLX5_CMD_STAT_BAD_RES_STATE_ERR)
 			migf->state = MLX5_MIGF_STATE_PRE_COPY_ERROR;
 		else
 			migf->state = MLX5_MIGF_STATE_ERROR;
@@ -553,7 +554,7 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context)
 		size_t image_size;
 		unsigned long flags;
 		bool initial_pre_copy = migf->state != MLX5_MIGF_STATE_PRE_COPY &&
-				!async_data->last_chunk;
+				!async_data->stop_copy_chunk;
 
 		image_size = MLX5_GET(save_vhca_state_out, async_data->out,
 				      actual_image_size);
@@ -571,7 +572,7 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context)
 		spin_unlock_irqrestore(&migf->list_lock, flags);
 		if (initial_pre_copy)
 			migf->pre_copy_initial_bytes += image_size;
-		migf->state = async_data->last_chunk ?
+		migf->state = async_data->stop_copy_chunk ?
 			MLX5_MIGF_STATE_COMPLETE : MLX5_MIGF_STATE_PRE_COPY;
 		wake_up_interruptible(&migf->poll_wait);
 		mlx5vf_save_callback_complete(migf, async_data);
@@ -623,7 +624,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev,
 
 	async_data = &migf->async_data;
 	async_data->buf = buf;
-	async_data->last_chunk = !track;
+	async_data->stop_copy_chunk = !track;
 	async_data->out = kvzalloc(out_size, GFP_KERNEL);
 	if (!async_data->out) {
 		err = -ENOMEM;
@@ -631,7 +632,7 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev,
 	}
 
 	if (MLX5VF_PRE_COPY_SUPP(mvdev)) {
-		if (async_data->last_chunk && migf->buf_header) {
+		if (async_data->stop_copy_chunk && migf->buf_header) {
 			header_buf = migf->buf_header;
 			migf->buf_header = NULL;
 		} else {
@@ -644,8 +645,8 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev,
 		}
 	}
 
-	if (async_data->last_chunk)
-		migf->state = MLX5_MIGF_STATE_SAVE_LAST;
+	if (async_data->stop_copy_chunk)
+		migf->state = MLX5_MIGF_STATE_SAVE_STOP_COPY_CHUNK;
 
 	async_data->header_buf = header_buf;
 	get_file(migf->filp);
diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
index 4fb37598c8e5..ac5dca5fe6b1 100644
--- a/drivers/vfio/pci/mlx5/cmd.h
+++ b/drivers/vfio/pci/mlx5/cmd.h
@@ -20,7 +20,7 @@ enum mlx5_vf_migf_state {
 	MLX5_MIGF_STATE_ERROR = 1,
 	MLX5_MIGF_STATE_PRE_COPY_ERROR,
 	MLX5_MIGF_STATE_PRE_COPY,
-	MLX5_MIGF_STATE_SAVE_LAST,
+	MLX5_MIGF_STATE_SAVE_STOP_COPY_CHUNK,
 	MLX5_MIGF_STATE_COMPLETE,
 };
 
@@ -78,7 +78,7 @@ struct mlx5vf_async_data {
 	struct mlx5_vhca_data_buffer *buf;
 	struct mlx5_vhca_data_buffer *header_buf;
 	int status;
-	u8 last_chunk:1;
+	u8 stop_copy_chunk:1;
 	void *out;
 };
 
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH vfio 6/9] vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
  2023-09-11  9:38 [PATCH vfio 0/9] Add chunk mode support for mlx5 driver Yishai Hadas
                   ` (4 preceding siblings ...)
  2023-09-11  9:38 ` [PATCH vfio 5/9] vfio/mlx5: Rename some stuff to match chunk mode Yishai Hadas
@ 2023-09-11  9:38 ` Yishai Hadas
  2023-09-11  9:38 ` [PATCH vfio 7/9] vfio/mlx5: Add support for SAVING in chunk mode Yishai Hadas
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Yishai Hadas @ 2023-09-11  9:38 UTC (permalink / raw
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg

This patch is another preparation step towards working in chunk mode.

It pre-allocates chunks for the STOP_COPY phase to let the driver use
them immediately and prevent an extra allocation upon that phase.

Before that patch we had a single large buffer that was dedicated for
the STOP_COPY phase as there was a single SAVE in the source for the
last image.

Once we'll move to chunk mode the idea is to have some small buffers
that will be used upon the STOP_COPY phase.

The driver will read-ahead from the firmware the full state in
small/optimized chunks while letting QEMU/user space read in parallel
the available data.

Each buffer holds its chunk number to let it be recognized down the road
in the coming patches.

The chunk buffer size is picked-up based on the minimum size that
firmware requires, the total full size and some max value in the driver
code which was set to 8MB to achieve some optimized downtime in the
general case.

As the chunk mode is applicable even if we move directly to STOP_COPY
the buffers preparation and some other related stuff is done
unconditionally with regards to STOP/PRE-COPY.

Note:
In that phase in the series we still didn't activate the chunk mode and
the first buffer will be used in all the places.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/mlx5/cmd.c  |  23 +++---
 drivers/vfio/pci/mlx5/cmd.h  |   8 +-
 drivers/vfio/pci/mlx5/main.c | 150 ++++++++++++++++++++++-------------
 3 files changed, 116 insertions(+), 65 deletions(-)

diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index 7b48a9b80bc6..b18735ee5d07 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -632,9 +632,9 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev,
 	}
 
 	if (MLX5VF_PRE_COPY_SUPP(mvdev)) {
-		if (async_data->stop_copy_chunk && migf->buf_header) {
-			header_buf = migf->buf_header;
-			migf->buf_header = NULL;
+		if (async_data->stop_copy_chunk && migf->buf_header[0]) {
+			header_buf = migf->buf_header[0];
+			migf->buf_header[0] = NULL;
 		} else {
 			header_buf = mlx5vf_get_data_buffer(migf,
 				sizeof(struct mlx5_vf_migration_header), DMA_NONE);
@@ -721,18 +721,21 @@ void mlx5vf_cmd_dealloc_pd(struct mlx5_vf_migration_file *migf)
 void mlx5fv_cmd_clean_migf_resources(struct mlx5_vf_migration_file *migf)
 {
 	struct mlx5_vhca_data_buffer *entry;
+	int i;
 
 	lockdep_assert_held(&migf->mvdev->state_mutex);
 	WARN_ON(migf->mvdev->mdev_detach);
 
-	if (migf->buf) {
-		mlx5vf_free_data_buffer(migf->buf);
-		migf->buf = NULL;
-	}
+	for (i = 0; i < MAX_NUM_CHUNKS; i++) {
+		if (migf->buf[i]) {
+			mlx5vf_free_data_buffer(migf->buf[i]);
+			migf->buf[i] = NULL;
+		}
 
-	if (migf->buf_header) {
-		mlx5vf_free_data_buffer(migf->buf_header);
-		migf->buf_header = NULL;
+		if (migf->buf_header[i]) {
+			mlx5vf_free_data_buffer(migf->buf_header[i]);
+			migf->buf_header[i] = NULL;
+		}
 	}
 
 	list_splice(&migf->avail_list, &migf->buf_list);
diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
index ac5dca5fe6b1..6d8d52804c83 100644
--- a/drivers/vfio/pci/mlx5/cmd.h
+++ b/drivers/vfio/pci/mlx5/cmd.h
@@ -64,6 +64,7 @@ struct mlx5_vhca_data_buffer {
 	u32 mkey;
 	enum dma_data_direction dma_dir;
 	u8 dmaed:1;
+	u8 stop_copy_chunk_num;
 	struct list_head buf_elm;
 	struct mlx5_vf_migration_file *migf;
 	/* Optimize mlx5vf_get_migration_page() for sequential access */
@@ -82,6 +83,8 @@ struct mlx5vf_async_data {
 	void *out;
 };
 
+#define MAX_NUM_CHUNKS 2
+
 struct mlx5_vf_migration_file {
 	struct file *filp;
 	struct mutex lock;
@@ -94,8 +97,9 @@ struct mlx5_vf_migration_file {
 	u32 record_tag;
 	u64 stop_copy_prep_size;
 	u64 pre_copy_initial_bytes;
-	struct mlx5_vhca_data_buffer *buf;
-	struct mlx5_vhca_data_buffer *buf_header;
+	/* Upon chunk mode preserve another set of buffers for stop_copy phase */
+	struct mlx5_vhca_data_buffer *buf[MAX_NUM_CHUNKS];
+	struct mlx5_vhca_data_buffer *buf_header[MAX_NUM_CHUNKS];
 	spinlock_t list_lock;
 	struct list_head buf_list;
 	struct list_head avail_list;
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index 90cb36fee6c0..351b61303b72 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -24,6 +24,8 @@
 /* Device specification max LOAD size */
 #define MAX_LOAD_SIZE (BIT_ULL(__mlx5_bit_sz(load_vhca_state_in, size)) - 1)
 
+#define MAX_CHUNK_SIZE SZ_8M
+
 static struct mlx5vf_pci_core_device *mlx5vf_drvdata(struct pci_dev *pdev)
 {
 	struct vfio_pci_core_device *core_device = dev_get_drvdata(&pdev->dev);
@@ -304,7 +306,8 @@ static void mlx5vf_mark_err(struct mlx5_vf_migration_file *migf)
 	wake_up_interruptible(&migf->poll_wait);
 }
 
-static int mlx5vf_add_stop_copy_header(struct mlx5_vf_migration_file *migf)
+static int mlx5vf_add_stop_copy_header(struct mlx5_vf_migration_file *migf,
+				       bool track)
 {
 	size_t size = sizeof(struct mlx5_vf_migration_header) +
 		sizeof(struct mlx5_vf_migration_tag_stop_copy_data);
@@ -331,7 +334,7 @@ static int mlx5vf_add_stop_copy_header(struct mlx5_vf_migration_file *migf)
 	to_buff = kmap_local_page(page);
 	memcpy(to_buff, &header, sizeof(header));
 	header_buf->length = sizeof(header);
-	data.stop_copy_size = cpu_to_le64(migf->buf->allocated_length);
+	data.stop_copy_size = cpu_to_le64(migf->buf[0]->allocated_length);
 	memcpy(to_buff + sizeof(header), &data, sizeof(data));
 	header_buf->length += sizeof(data);
 	kunmap_local(to_buff);
@@ -340,48 +343,83 @@ static int mlx5vf_add_stop_copy_header(struct mlx5_vf_migration_file *migf)
 	spin_lock_irqsave(&migf->list_lock, flags);
 	list_add_tail(&header_buf->buf_elm, &migf->buf_list);
 	spin_unlock_irqrestore(&migf->list_lock, flags);
-	migf->pre_copy_initial_bytes = size;
+	if (track)
+		migf->pre_copy_initial_bytes = size;
 	return 0;
 err:
 	mlx5vf_put_data_buffer(header_buf);
 	return ret;
 }
 
-static int mlx5vf_prep_stop_copy(struct mlx5_vf_migration_file *migf,
-				 size_t state_size)
+static int mlx5vf_prep_stop_copy(struct mlx5vf_pci_core_device *mvdev,
+				 struct mlx5_vf_migration_file *migf,
+				 size_t state_size, u64 full_size,
+				 bool track)
 {
 	struct mlx5_vhca_data_buffer *buf;
 	size_t inc_state_size;
+	int num_chunks;
 	int ret;
+	int i;
 
-	/* let's be ready for stop_copy size that might grow by 10 percents */
-	if (check_add_overflow(state_size, state_size / 10, &inc_state_size))
-		inc_state_size = state_size;
+	if (mvdev->chunk_mode) {
+		size_t chunk_size = min_t(size_t, MAX_CHUNK_SIZE, full_size);
 
-	buf = mlx5vf_get_data_buffer(migf, inc_state_size, DMA_FROM_DEVICE);
-	if (IS_ERR(buf))
-		return PTR_ERR(buf);
+		/* from firmware perspective at least 'state_size' buffer should be set */
+		inc_state_size = max(state_size, chunk_size);
+	} else {
+		if (track) {
+			/* let's be ready for stop_copy size that might grow by 10 percents */
+			if (check_add_overflow(state_size, state_size / 10, &inc_state_size))
+				inc_state_size = state_size;
+		} else {
+			inc_state_size = state_size;
+		}
+	}
 
-	migf->buf = buf;
-	buf = mlx5vf_get_data_buffer(migf,
-			sizeof(struct mlx5_vf_migration_header), DMA_NONE);
-	if (IS_ERR(buf)) {
-		ret = PTR_ERR(buf);
-		goto err;
+	/* let's not overflow the device specification max SAVE size */
+	inc_state_size = min_t(size_t, inc_state_size,
+		(BIT_ULL(__mlx5_bit_sz(save_vhca_state_in, size)) - PAGE_SIZE));
+
+	num_chunks = mvdev->chunk_mode ? MAX_NUM_CHUNKS : 1;
+	for (i = 0; i < num_chunks; i++) {
+		buf = mlx5vf_get_data_buffer(migf, inc_state_size, DMA_FROM_DEVICE);
+		if (IS_ERR(buf)) {
+			ret = PTR_ERR(buf);
+			goto err;
+		}
+
+		migf->buf[i] = buf;
+		buf = mlx5vf_get_data_buffer(migf,
+				sizeof(struct mlx5_vf_migration_header), DMA_NONE);
+		if (IS_ERR(buf)) {
+			ret = PTR_ERR(buf);
+			goto err;
+		}
+		migf->buf_header[i] = buf;
+		if (mvdev->chunk_mode) {
+			migf->buf[i]->stop_copy_chunk_num = i + 1;
+			migf->buf_header[i]->stop_copy_chunk_num = i + 1;
+		}
 	}
 
-	migf->buf_header = buf;
-	ret = mlx5vf_add_stop_copy_header(migf);
+	ret = mlx5vf_add_stop_copy_header(migf, track);
 	if (ret)
-		goto err_header;
+		goto err;
 	return 0;
 
-err_header:
-	mlx5vf_put_data_buffer(migf->buf_header);
-	migf->buf_header = NULL;
 err:
-	mlx5vf_put_data_buffer(migf->buf);
-	migf->buf = NULL;
+	for (i = 0; i < num_chunks; i++) {
+		if (migf->buf[i]) {
+			mlx5vf_put_data_buffer(migf->buf[i]);
+			migf->buf[i] = NULL;
+		}
+		if (migf->buf_header[i]) {
+			mlx5vf_put_data_buffer(migf->buf_header[i]);
+			migf->buf_header[i] = NULL;
+		}
+	}
+
 	return ret;
 }
 
@@ -511,9 +549,9 @@ static int mlx5vf_pci_save_device_inc_data(struct mlx5vf_pci_core_device *mvdev)
 		goto err;
 
 	/* Checking whether we have a matching pre-allocated buffer that can fit */
-	if (migf->buf && migf->buf->allocated_length >= length) {
-		buf = migf->buf;
-		migf->buf = NULL;
+	if (migf->buf[0]->allocated_length >= length) {
+		buf = migf->buf[0];
+		migf->buf[0] = NULL;
 	} else {
 		buf = mlx5vf_get_data_buffer(migf, length, DMA_FROM_DEVICE);
 		if (IS_ERR(buf)) {
@@ -541,6 +579,7 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev, bool track)
 	struct mlx5_vf_migration_file *migf;
 	struct mlx5_vhca_data_buffer *buf;
 	size_t length;
+	u64 full_size;
 	int ret;
 
 	migf = kzalloc(sizeof(*migf), GFP_KERNEL_ACCOUNT);
@@ -574,20 +613,25 @@ mlx5vf_pci_save_device_data(struct mlx5vf_pci_core_device *mvdev, bool track)
 	INIT_LIST_HEAD(&migf->buf_list);
 	INIT_LIST_HEAD(&migf->avail_list);
 	spin_lock_init(&migf->list_lock);
-	ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, NULL, 0);
+	ret = mlx5vf_cmd_query_vhca_migration_state(mvdev, &length, &full_size, 0);
+	if (ret)
+		goto out_pd;
+
+	ret = mlx5vf_prep_stop_copy(mvdev, migf, length, full_size, track);
 	if (ret)
 		goto out_pd;
 
 	if (track) {
-		ret = mlx5vf_prep_stop_copy(migf, length);
-		if (ret)
+		/* leave the allocated buffer ready for the stop-copy phase */
+		buf = mlx5vf_alloc_data_buffer(migf,
+			migf->buf[0]->allocated_length, DMA_FROM_DEVICE);
+		if (IS_ERR(buf)) {
+			ret = PTR_ERR(buf);
 			goto out_pd;
-	}
-
-	buf = mlx5vf_alloc_data_buffer(migf, length, DMA_FROM_DEVICE);
-	if (IS_ERR(buf)) {
-		ret = PTR_ERR(buf);
-		goto out_pd;
+		}
+	} else {
+		buf = migf->buf[0];
+		migf->buf[0] = NULL;
 	}
 
 	ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf, false, track);
@@ -820,8 +864,8 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf,
 				   size_t len, loff_t *pos)
 {
 	struct mlx5_vf_migration_file *migf = filp->private_data;
-	struct mlx5_vhca_data_buffer *vhca_buf = migf->buf;
-	struct mlx5_vhca_data_buffer *vhca_buf_header = migf->buf_header;
+	struct mlx5_vhca_data_buffer *vhca_buf = migf->buf[0];
+	struct mlx5_vhca_data_buffer *vhca_buf_header = migf->buf_header[0];
 	loff_t requested_length;
 	bool has_work = false;
 	ssize_t done = 0;
@@ -856,15 +900,15 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf,
 			if (vhca_buf_header->allocated_length < migf->record_size) {
 				mlx5vf_free_data_buffer(vhca_buf_header);
 
-				migf->buf_header = mlx5vf_alloc_data_buffer(migf,
+				migf->buf_header[0] = mlx5vf_alloc_data_buffer(migf,
 						migf->record_size, DMA_NONE);
-				if (IS_ERR(migf->buf_header)) {
-					ret = PTR_ERR(migf->buf_header);
-					migf->buf_header = NULL;
+				if (IS_ERR(migf->buf_header[0])) {
+					ret = PTR_ERR(migf->buf_header[0]);
+					migf->buf_header[0] = NULL;
 					goto out_unlock;
 				}
 
-				vhca_buf_header = migf->buf_header;
+				vhca_buf_header = migf->buf_header[0];
 			}
 
 			vhca_buf_header->start_pos = migf->max_pos;
@@ -884,15 +928,15 @@ static ssize_t mlx5vf_resume_write(struct file *filp, const char __user *buf,
 			if (vhca_buf->allocated_length < size) {
 				mlx5vf_free_data_buffer(vhca_buf);
 
-				migf->buf = mlx5vf_alloc_data_buffer(migf,
+				migf->buf[0] = mlx5vf_alloc_data_buffer(migf,
 							size, DMA_TO_DEVICE);
-				if (IS_ERR(migf->buf)) {
-					ret = PTR_ERR(migf->buf);
-					migf->buf = NULL;
+				if (IS_ERR(migf->buf[0])) {
+					ret = PTR_ERR(migf->buf[0]);
+					migf->buf[0] = NULL;
 					goto out_unlock;
 				}
 
-				vhca_buf = migf->buf;
+				vhca_buf = migf->buf[0];
 			}
 
 			vhca_buf->start_pos = migf->max_pos;
@@ -974,7 +1018,7 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev)
 		goto out_pd;
 	}
 
-	migf->buf = buf;
+	migf->buf[0] = buf;
 	if (MLX5VF_PRE_COPY_SUPP(mvdev)) {
 		buf = mlx5vf_alloc_data_buffer(migf,
 			sizeof(struct mlx5_vf_migration_header), DMA_NONE);
@@ -983,7 +1027,7 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev)
 			goto out_buf;
 		}
 
-		migf->buf_header = buf;
+		migf->buf_header[0] = buf;
 		migf->load_state = MLX5_VF_LOAD_STATE_READ_HEADER;
 	} else {
 		/* Initial state will be to read the image */
@@ -997,7 +1041,7 @@ mlx5vf_pci_resume_device_data(struct mlx5vf_pci_core_device *mvdev)
 	spin_lock_init(&migf->list_lock);
 	return migf;
 out_buf:
-	mlx5vf_free_data_buffer(migf->buf);
+	mlx5vf_free_data_buffer(migf->buf[0]);
 out_pd:
 	mlx5vf_cmd_dealloc_pd(migf);
 out_free:
@@ -1101,7 +1145,7 @@ mlx5vf_pci_step_device_state_locked(struct mlx5vf_pci_core_device *mvdev,
 		if (!MLX5VF_PRE_COPY_SUPP(mvdev)) {
 			ret = mlx5vf_cmd_load_vhca_state(mvdev,
 							 mvdev->resuming_migf,
-							 mvdev->resuming_migf->buf);
+							 mvdev->resuming_migf->buf[0]);
 			if (ret)
 				return ERR_PTR(ret);
 		}
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH vfio 7/9] vfio/mlx5: Add support for SAVING in chunk mode
  2023-09-11  9:38 [PATCH vfio 0/9] Add chunk mode support for mlx5 driver Yishai Hadas
                   ` (5 preceding siblings ...)
  2023-09-11  9:38 ` [PATCH vfio 6/9] vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase Yishai Hadas
@ 2023-09-11  9:38 ` Yishai Hadas
  2023-09-11  9:38 ` [PATCH vfio 8/9] vfio/mlx5: Add support for READING " Yishai Hadas
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Yishai Hadas @ 2023-09-11  9:38 UTC (permalink / raw
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg

Add support for SAVING in chunk mode, it includes running a work
that will fill the next chunk from the device.

In case the number of available chunks will reach the MAX_NUM_CHUNKS,
the next chunk SAVING will be delayed till the reader will consume one
chunk.

The next patch from the series will add the reader part of the chunk
mode.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/mlx5/cmd.c  | 43 +++++++++++++++---
 drivers/vfio/pci/mlx5/cmd.h  | 12 ++++++
 drivers/vfio/pci/mlx5/main.c | 84 +++++++++++++++++++++++++++++++-----
 3 files changed, 122 insertions(+), 17 deletions(-)

diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index b18735ee5d07..e68bf9ba5300 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -435,6 +435,7 @@ mlx5vf_alloc_data_buffer(struct mlx5_vf_migration_file *migf,
 void mlx5vf_put_data_buffer(struct mlx5_vhca_data_buffer *buf)
 {
 	spin_lock_irq(&buf->migf->list_lock);
+	buf->stop_copy_chunk_num = 0;
 	list_add_tail(&buf->buf_elm, &buf->migf->avail_list);
 	spin_unlock_irq(&buf->migf->list_lock);
 }
@@ -551,6 +552,8 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context)
 			struct mlx5_vf_migration_file, async_data);
 
 	if (!status) {
+		size_t next_required_umem_size = 0;
+		bool stop_copy_last_chunk;
 		size_t image_size;
 		unsigned long flags;
 		bool initial_pre_copy = migf->state != MLX5_MIGF_STATE_PRE_COPY &&
@@ -558,6 +561,11 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context)
 
 		image_size = MLX5_GET(save_vhca_state_out, async_data->out,
 				      actual_image_size);
+		if (async_data->buf->stop_copy_chunk_num)
+			next_required_umem_size = MLX5_GET(save_vhca_state_out,
+					async_data->out, next_required_umem_size);
+		stop_copy_last_chunk = async_data->stop_copy_chunk &&
+				!next_required_umem_size;
 		if (async_data->header_buf) {
 			status = add_buf_header(async_data->header_buf, image_size,
 						initial_pre_copy);
@@ -569,12 +577,28 @@ static void mlx5vf_save_callback(int status, struct mlx5_async_work *context)
 		migf->max_pos += async_data->buf->length;
 		spin_lock_irqsave(&migf->list_lock, flags);
 		list_add_tail(&async_data->buf->buf_elm, &migf->buf_list);
+		if (async_data->buf->stop_copy_chunk_num) {
+			migf->num_ready_chunks++;
+			if (next_required_umem_size &&
+			    migf->num_ready_chunks >= MAX_NUM_CHUNKS) {
+				/* Delay the next SAVE till one chunk be consumed */
+				migf->next_required_umem_size = next_required_umem_size;
+				next_required_umem_size = 0;
+			}
+		}
 		spin_unlock_irqrestore(&migf->list_lock, flags);
-		if (initial_pre_copy)
+		if (initial_pre_copy) {
 			migf->pre_copy_initial_bytes += image_size;
-		migf->state = async_data->stop_copy_chunk ?
-			MLX5_MIGF_STATE_COMPLETE : MLX5_MIGF_STATE_PRE_COPY;
+			migf->state = MLX5_MIGF_STATE_PRE_COPY;
+		}
+		if (stop_copy_last_chunk)
+			migf->state = MLX5_MIGF_STATE_COMPLETE;
 		wake_up_interruptible(&migf->poll_wait);
+		if (next_required_umem_size)
+			mlx5vf_mig_file_set_save_work(migf,
+				/* Picking up the next chunk num */
+				(async_data->buf->stop_copy_chunk_num % MAX_NUM_CHUNKS) + 1,
+				next_required_umem_size);
 		mlx5vf_save_callback_complete(migf, async_data);
 		return;
 	}
@@ -632,10 +656,15 @@ int mlx5vf_cmd_save_vhca_state(struct mlx5vf_pci_core_device *mvdev,
 	}
 
 	if (MLX5VF_PRE_COPY_SUPP(mvdev)) {
-		if (async_data->stop_copy_chunk && migf->buf_header[0]) {
-			header_buf = migf->buf_header[0];
-			migf->buf_header[0] = NULL;
-		} else {
+		if (async_data->stop_copy_chunk) {
+			u8 header_idx = buf->stop_copy_chunk_num ?
+				buf->stop_copy_chunk_num - 1 : 0;
+
+			header_buf = migf->buf_header[header_idx];
+			migf->buf_header[header_idx] = NULL;
+		}
+
+		if (!header_buf) {
 			header_buf = mlx5vf_get_data_buffer(migf,
 				sizeof(struct mlx5_vf_migration_header), DMA_NONE);
 			if (IS_ERR(header_buf)) {
diff --git a/drivers/vfio/pci/mlx5/cmd.h b/drivers/vfio/pci/mlx5/cmd.h
index 6d8d52804c83..f2c7227fa683 100644
--- a/drivers/vfio/pci/mlx5/cmd.h
+++ b/drivers/vfio/pci/mlx5/cmd.h
@@ -83,6 +83,13 @@ struct mlx5vf_async_data {
 	void *out;
 };
 
+struct mlx5vf_save_work_data {
+	struct mlx5_vf_migration_file *migf;
+	size_t next_required_umem_size;
+	struct work_struct work;
+	u8 chunk_num;
+};
+
 #define MAX_NUM_CHUNKS 2
 
 struct mlx5_vf_migration_file {
@@ -97,9 +104,12 @@ struct mlx5_vf_migration_file {
 	u32 record_tag;
 	u64 stop_copy_prep_size;
 	u64 pre_copy_initial_bytes;
+	size_t next_required_umem_size;
+	u8 num_ready_chunks;
 	/* Upon chunk mode preserve another set of buffers for stop_copy phase */
 	struct mlx5_vhca_data_buffer *buf[MAX_NUM_CHUNKS];
 	struct mlx5_vhca_data_buffer *buf_header[MAX_NUM_CHUNKS];
+	struct mlx5vf_save_work_data save_data[MAX_NUM_CHUNKS];
 	spinlock_t list_lock;
 	struct list_head buf_list;
 	struct list_head avail_list;
@@ -223,6 +233,8 @@ struct page *mlx5vf_get_migration_page(struct mlx5_vhca_data_buffer *buf,
 void mlx5vf_state_mutex_unlock(struct mlx5vf_pci_core_device *mvdev);
 void mlx5vf_disable_fds(struct mlx5vf_pci_core_device *mvdev);
 void mlx5vf_mig_file_cleanup_cb(struct work_struct *_work);
+void mlx5vf_mig_file_set_save_work(struct mlx5_vf_migration_file *migf,
+				   u8 chunk_num, size_t next_required_umem_size);
 int mlx5vf_start_page_tracker(struct vfio_device *vdev,
 		struct rb_root_cached *ranges, u32 nnodes, u64 *page_size);
 int mlx5vf_stop_page_tracker(struct vfio_device *vdev);
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index 351b61303b72..c80caf55499f 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -306,6 +306,73 @@ static void mlx5vf_mark_err(struct mlx5_vf_migration_file *migf)
 	wake_up_interruptible(&migf->poll_wait);
 }
 
+void mlx5vf_mig_file_set_save_work(struct mlx5_vf_migration_file *migf,
+				   u8 chunk_num, size_t next_required_umem_size)
+{
+	migf->save_data[chunk_num - 1].next_required_umem_size =
+			next_required_umem_size;
+	migf->save_data[chunk_num - 1].migf = migf;
+	get_file(migf->filp);
+	queue_work(migf->mvdev->cb_wq,
+		   &migf->save_data[chunk_num - 1].work);
+}
+
+static struct mlx5_vhca_data_buffer *
+mlx5vf_mig_file_get_stop_copy_buf(struct mlx5_vf_migration_file *migf,
+				  u8 index, size_t required_length)
+{
+	struct mlx5_vhca_data_buffer *buf = migf->buf[index];
+	u8 chunk_num;
+
+	WARN_ON(!buf);
+	chunk_num = buf->stop_copy_chunk_num;
+	buf->migf->buf[index] = NULL;
+	/* Checking whether the pre-allocated buffer can fit */
+	if (buf->allocated_length >= required_length)
+		return buf;
+
+	mlx5vf_put_data_buffer(buf);
+	buf = mlx5vf_get_data_buffer(buf->migf, required_length,
+				     DMA_FROM_DEVICE);
+	if (IS_ERR(buf))
+		return buf;
+
+	buf->stop_copy_chunk_num = chunk_num;
+	return buf;
+}
+
+static void mlx5vf_mig_file_save_work(struct work_struct *_work)
+{
+	struct mlx5vf_save_work_data *save_data = container_of(_work,
+		struct mlx5vf_save_work_data, work);
+	struct mlx5_vf_migration_file *migf = save_data->migf;
+	struct mlx5vf_pci_core_device *mvdev = migf->mvdev;
+	struct mlx5_vhca_data_buffer *buf;
+
+	mutex_lock(&mvdev->state_mutex);
+	if (migf->state == MLX5_MIGF_STATE_ERROR)
+		goto end;
+
+	buf = mlx5vf_mig_file_get_stop_copy_buf(migf,
+				save_data->chunk_num - 1,
+				save_data->next_required_umem_size);
+	if (IS_ERR(buf))
+		goto err;
+
+	if (mlx5vf_cmd_save_vhca_state(mvdev, migf, buf, true, false))
+		goto err_save;
+
+	goto end;
+
+err_save:
+	mlx5vf_put_data_buffer(buf);
+err:
+	mlx5vf_mark_err(migf);
+end:
+	mlx5vf_state_mutex_unlock(mvdev);
+	fput(migf->filp);
+}
+
 static int mlx5vf_add_stop_copy_header(struct mlx5_vf_migration_file *migf,
 				       bool track)
 {
@@ -400,6 +467,9 @@ static int mlx5vf_prep_stop_copy(struct mlx5vf_pci_core_device *mvdev,
 		if (mvdev->chunk_mode) {
 			migf->buf[i]->stop_copy_chunk_num = i + 1;
 			migf->buf_header[i]->stop_copy_chunk_num = i + 1;
+			INIT_WORK(&migf->save_data[i].work,
+				  mlx5vf_mig_file_save_work);
+			migf->save_data[i].chunk_num = i + 1;
 		}
 	}
 
@@ -548,16 +618,10 @@ static int mlx5vf_pci_save_device_inc_data(struct mlx5vf_pci_core_device *mvdev)
 	if (ret)
 		goto err;
 
-	/* Checking whether we have a matching pre-allocated buffer that can fit */
-	if (migf->buf[0]->allocated_length >= length) {
-		buf = migf->buf[0];
-		migf->buf[0] = NULL;
-	} else {
-		buf = mlx5vf_get_data_buffer(migf, length, DMA_FROM_DEVICE);
-		if (IS_ERR(buf)) {
-			ret = PTR_ERR(buf);
-			goto err;
-		}
+	buf = mlx5vf_mig_file_get_stop_copy_buf(migf, 0, length);
+	if (IS_ERR(buf)) {
+		ret = PTR_ERR(buf);
+		goto err;
 	}
 
 	ret = mlx5vf_cmd_save_vhca_state(mvdev, migf, buf, true, false);
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH vfio 8/9] vfio/mlx5: Add support for READING in chunk mode
  2023-09-11  9:38 [PATCH vfio 0/9] Add chunk mode support for mlx5 driver Yishai Hadas
                   ` (6 preceding siblings ...)
  2023-09-11  9:38 ` [PATCH vfio 7/9] vfio/mlx5: Add support for SAVING in chunk mode Yishai Hadas
@ 2023-09-11  9:38 ` Yishai Hadas
  2023-09-11  9:38 ` [PATCH vfio 9/9] vfio/mlx5: Activate the chunk mode functionality Yishai Hadas
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 20+ messages in thread
From: Yishai Hadas @ 2023-09-11  9:38 UTC (permalink / raw
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg

Add support for READING in chunk mode.

In case the last SAVE command recognized that there was still some image
to be read, however, there was no available chunk to use for, this task
was delayed for the reader till one chunk will be consumed and becomes
available.

In the above case, a work will be executed to read in the background the
next image from the device.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/mlx5/main.c | 43 +++++++++++++++++++++++++++++++-----
 1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index c80caf55499f..b6ac66c5008d 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -160,6 +160,41 @@ mlx5vf_get_data_buff_from_pos(struct mlx5_vf_migration_file *migf, loff_t pos,
 	return found ? buf : NULL;
 }
 
+static void mlx5vf_buf_read_done(struct mlx5_vhca_data_buffer *vhca_buf)
+{
+	struct mlx5_vf_migration_file *migf = vhca_buf->migf;
+
+	if (vhca_buf->stop_copy_chunk_num) {
+		bool is_header = vhca_buf->dma_dir == DMA_NONE;
+		u8 chunk_num = vhca_buf->stop_copy_chunk_num;
+		size_t next_required_umem_size = 0;
+
+		if (is_header)
+			migf->buf_header[chunk_num - 1] = vhca_buf;
+		else
+			migf->buf[chunk_num - 1] = vhca_buf;
+
+		spin_lock_irq(&migf->list_lock);
+		list_del_init(&vhca_buf->buf_elm);
+		if (!is_header) {
+			next_required_umem_size =
+				migf->next_required_umem_size;
+			migf->next_required_umem_size = 0;
+			migf->num_ready_chunks--;
+		}
+		spin_unlock_irq(&migf->list_lock);
+		if (next_required_umem_size)
+			mlx5vf_mig_file_set_save_work(migf, chunk_num,
+						      next_required_umem_size);
+		return;
+	}
+
+	spin_lock_irq(&migf->list_lock);
+	list_del_init(&vhca_buf->buf_elm);
+	list_add_tail(&vhca_buf->buf_elm, &vhca_buf->migf->avail_list);
+	spin_unlock_irq(&migf->list_lock);
+}
+
 static ssize_t mlx5vf_buf_read(struct mlx5_vhca_data_buffer *vhca_buf,
 			       char __user **buf, size_t *len, loff_t *pos)
 {
@@ -195,12 +230,8 @@ static ssize_t mlx5vf_buf_read(struct mlx5_vhca_data_buffer *vhca_buf,
 		copy_len -= page_len;
 	}
 
-	if (*pos >= vhca_buf->start_pos + vhca_buf->length) {
-		spin_lock_irq(&vhca_buf->migf->list_lock);
-		list_del_init(&vhca_buf->buf_elm);
-		list_add_tail(&vhca_buf->buf_elm, &vhca_buf->migf->avail_list);
-		spin_unlock_irq(&vhca_buf->migf->list_lock);
-	}
+	if (*pos >= vhca_buf->start_pos + vhca_buf->length)
+		mlx5vf_buf_read_done(vhca_buf);
 
 	return done;
 }
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH vfio 9/9] vfio/mlx5: Activate the chunk mode functionality
  2023-09-11  9:38 [PATCH vfio 0/9] Add chunk mode support for mlx5 driver Yishai Hadas
                   ` (7 preceding siblings ...)
  2023-09-11  9:38 ` [PATCH vfio 8/9] vfio/mlx5: Add support for READING " Yishai Hadas
@ 2023-09-11  9:38 ` Yishai Hadas
  2023-09-20 18:31 ` [PATCH vfio 0/9] Add chunk mode support for mlx5 driver Jason Gunthorpe
  2023-10-02  8:47 ` (subset) " Leon Romanovsky
  10 siblings, 0 replies; 20+ messages in thread
From: Yishai Hadas @ 2023-09-11  9:38 UTC (permalink / raw
  To: alex.williamson, jgg
  Cc: kvm, kevin.tian, joao.m.martins, leonro, yishaih, maorg

Now that all pieces are in place, activate the chunk mode functionality
based on device capabilities.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
---
 drivers/vfio/pci/mlx5/cmd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/vfio/pci/mlx5/cmd.c b/drivers/vfio/pci/mlx5/cmd.c
index e68bf9ba5300..efd1d252cdc9 100644
--- a/drivers/vfio/pci/mlx5/cmd.c
+++ b/drivers/vfio/pci/mlx5/cmd.c
@@ -261,6 +261,9 @@ void mlx5vf_cmd_set_migratable(struct mlx5vf_pci_core_device *mvdev,
 		mvdev->core_device.vdev.migration_flags |=
 			VFIO_MIGRATION_PRE_COPY;
 
+	if (MLX5_CAP_GEN_2(mvdev->mdev, migration_in_chunks))
+		mvdev->chunk_mode = 1;
+
 end:
 	mlx5_vf_put_core_dev(mvdev->mdev);
 }
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH vfio 0/9] Add chunk mode support for mlx5 driver
  2023-09-11  9:38 [PATCH vfio 0/9] Add chunk mode support for mlx5 driver Yishai Hadas
                   ` (8 preceding siblings ...)
  2023-09-11  9:38 ` [PATCH vfio 9/9] vfio/mlx5: Activate the chunk mode functionality Yishai Hadas
@ 2023-09-20 18:31 ` Jason Gunthorpe
  2023-09-27 10:59   ` Yishai Hadas
  2023-10-02  8:47 ` (subset) " Leon Romanovsky
  10 siblings, 1 reply; 20+ messages in thread
From: Jason Gunthorpe @ 2023-09-20 18:31 UTC (permalink / raw
  To: Yishai Hadas
  Cc: alex.williamson, kvm, kevin.tian, joao.m.martins, leonro, maorg

On Mon, Sep 11, 2023 at 12:38:47PM +0300, Yishai Hadas wrote:
> This series adds 'chunk mode' support for mlx5 driver upon the migration
> flow.
> 
> Before this series, we were limited to 4GB state size, as of the 4 bytes
> max value based on the device specification for the query/save/load
> commands.
> 
> Once the device supports 'chunk mode' the driver can support state size
> which is larger than 4GB.
> 
> In that case, the device has the capability to split a single image to
> multiple chunks as long as the software provides a buffer in the minimum
> size reported by the device.
> 
> The driver should query for the minimum buffer size required using
> QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
> input, in that case, the output will include both the minimum buffer
> size and also the remaining total size to be reported/used where it will
> be applicable.
> 
> Upon chunk mode, there may be multiple images that will be read from the
> device upon STOP_COPY. The driver will read ahead from the firmware the
> full state in small/optimized chunks while letting QEMU/user space read
> in parallel the available data.
> 
> The chunk buffer size is picked up based on the minimum size that
> firmware requires, the total full size and some max value in the driver
> code which was set to 8MB to achieve some optimized downtime in the
> general case.
> 
> With that series in place, we could migrate successfully a device state
> with a larger size than 4GB, while even improving the downtime in some
> scenarios.
> 
> Note:
> As the first patch should go to net/mlx5 we may need to send it as a
> pull request format to VFIO to avoid conflicts before acceptance.
> 
> Yishai
> 
> Yishai Hadas (9):
>   net/mlx5: Introduce ifc bits for migration in a chunk mode
>   vfio/mlx5: Wake up the reader post of disabling the SAVING migration
>     file
>   vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
>     error
>   vfio/mlx5: Enable querying state size which is > 4GB
>   vfio/mlx5: Rename some stuff to match chunk mode
>   vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
>   vfio/mlx5: Add support for SAVING in chunk mode
>   vfio/mlx5: Add support for READING in chunk mode
>   vfio/mlx5: Activate the chunk mode functionality

I didn't check in great depth but this looks OK to me

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

I think this is a good design to start motivating more qmeu
improvements, eg using io_uring as we could go further in the driver
to optimize with that kind of support.

Jason

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH vfio 0/9] Add chunk mode support for mlx5 driver
  2023-09-20 18:31 ` [PATCH vfio 0/9] Add chunk mode support for mlx5 driver Jason Gunthorpe
@ 2023-09-27 10:59   ` Yishai Hadas
  2023-09-27 22:10     ` Alex Williamson
  0 siblings, 1 reply; 20+ messages in thread
From: Yishai Hadas @ 2023-09-27 10:59 UTC (permalink / raw
  To: Jason Gunthorpe, Alex Williamson
  Cc: kvm, kevin.tian, joao.m.martins, leonro, maorg

On 20/09/2023 21:31, Jason Gunthorpe wrote:
> On Mon, Sep 11, 2023 at 12:38:47PM +0300, Yishai Hadas wrote:
>> This series adds 'chunk mode' support for mlx5 driver upon the migration
>> flow.
>>
>> Before this series, we were limited to 4GB state size, as of the 4 bytes
>> max value based on the device specification for the query/save/load
>> commands.
>>
>> Once the device supports 'chunk mode' the driver can support state size
>> which is larger than 4GB.
>>
>> In that case, the device has the capability to split a single image to
>> multiple chunks as long as the software provides a buffer in the minimum
>> size reported by the device.
>>
>> The driver should query for the minimum buffer size required using
>> QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
>> input, in that case, the output will include both the minimum buffer
>> size and also the remaining total size to be reported/used where it will
>> be applicable.
>>
>> Upon chunk mode, there may be multiple images that will be read from the
>> device upon STOP_COPY. The driver will read ahead from the firmware the
>> full state in small/optimized chunks while letting QEMU/user space read
>> in parallel the available data.
>>
>> The chunk buffer size is picked up based on the minimum size that
>> firmware requires, the total full size and some max value in the driver
>> code which was set to 8MB to achieve some optimized downtime in the
>> general case.
>>
>> With that series in place, we could migrate successfully a device state
>> with a larger size than 4GB, while even improving the downtime in some
>> scenarios.
>>
>> Note:
>> As the first patch should go to net/mlx5 we may need to send it as a
>> pull request format to VFIO to avoid conflicts before acceptance.
>>
>> Yishai
>>
>> Yishai Hadas (9):
>>    net/mlx5: Introduce ifc bits for migration in a chunk mode
>>    vfio/mlx5: Wake up the reader post of disabling the SAVING migration
>>      file
>>    vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
>>      error
>>    vfio/mlx5: Enable querying state size which is > 4GB
>>    vfio/mlx5: Rename some stuff to match chunk mode
>>    vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
>>    vfio/mlx5: Add support for SAVING in chunk mode
>>    vfio/mlx5: Add support for READING in chunk mode
>>    vfio/mlx5: Activate the chunk mode functionality
> I didn't check in great depth but this looks OK to me
>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Thanks Jason

>
> I think this is a good design to start motivating more qmeu
> improvements, eg using io_uring as we could go further in the driver
> to optimize with that kind of support.
>
> Jason

Alex,

Can we move forward with the series and send a PR for the first patch 
that needs to go also to net/mlx5 ?

Thanks,
Yishai



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH vfio 0/9] Add chunk mode support for mlx5 driver
  2023-09-27 10:59   ` Yishai Hadas
@ 2023-09-27 22:10     ` Alex Williamson
  2023-09-28 11:08       ` Leon Romanovsky
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Williamson @ 2023-09-27 22:10 UTC (permalink / raw
  To: Yishai Hadas
  Cc: Jason Gunthorpe, kvm, kevin.tian, joao.m.martins, leonro, maorg

On Wed, 27 Sep 2023 13:59:06 +0300
Yishai Hadas <yishaih@nvidia.com> wrote:

> On 20/09/2023 21:31, Jason Gunthorpe wrote:
> > On Mon, Sep 11, 2023 at 12:38:47PM +0300, Yishai Hadas wrote:  
> >> This series adds 'chunk mode' support for mlx5 driver upon the migration
> >> flow.
> >>
> >> Before this series, we were limited to 4GB state size, as of the 4 bytes
> >> max value based on the device specification for the query/save/load
> >> commands.
> >>
> >> Once the device supports 'chunk mode' the driver can support state size
> >> which is larger than 4GB.
> >>
> >> In that case, the device has the capability to split a single image to
> >> multiple chunks as long as the software provides a buffer in the minimum
> >> size reported by the device.
> >>
> >> The driver should query for the minimum buffer size required using
> >> QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
> >> input, in that case, the output will include both the minimum buffer
> >> size and also the remaining total size to be reported/used where it will
> >> be applicable.
> >>
> >> Upon chunk mode, there may be multiple images that will be read from the
> >> device upon STOP_COPY. The driver will read ahead from the firmware the
> >> full state in small/optimized chunks while letting QEMU/user space read
> >> in parallel the available data.
> >>
> >> The chunk buffer size is picked up based on the minimum size that
> >> firmware requires, the total full size and some max value in the driver
> >> code which was set to 8MB to achieve some optimized downtime in the
> >> general case.
> >>
> >> With that series in place, we could migrate successfully a device state
> >> with a larger size than 4GB, while even improving the downtime in some
> >> scenarios.
> >>
> >> Note:
> >> As the first patch should go to net/mlx5 we may need to send it as a
> >> pull request format to VFIO to avoid conflicts before acceptance.
> >>
> >> Yishai
> >>
> >> Yishai Hadas (9):
> >>    net/mlx5: Introduce ifc bits for migration in a chunk mode
> >>    vfio/mlx5: Wake up the reader post of disabling the SAVING migration
> >>      file
> >>    vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
> >>      error
> >>    vfio/mlx5: Enable querying state size which is > 4GB
> >>    vfio/mlx5: Rename some stuff to match chunk mode
> >>    vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
> >>    vfio/mlx5: Add support for SAVING in chunk mode
> >>    vfio/mlx5: Add support for READING in chunk mode
> >>    vfio/mlx5: Activate the chunk mode functionality  
> > I didn't check in great depth but this looks OK to me
> >
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>  
> 
> Thanks Jason
> 
> >
> > I think this is a good design to start motivating more qmeu
> > improvements, eg using io_uring as we could go further in the driver
> > to optimize with that kind of support.
> >
> > Jason  
> 
> Alex,
> 
> Can we move forward with the series and send a PR for the first patch 
> that needs to go also to net/mlx5 ?

Yeah, I don't spot any issues with it either.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH vfio 0/9] Add chunk mode support for mlx5 driver
  2023-09-27 22:10     ` Alex Williamson
@ 2023-09-28 11:08       ` Leon Romanovsky
  2023-09-28 18:29         ` Alex Williamson
  0 siblings, 1 reply; 20+ messages in thread
From: Leon Romanovsky @ 2023-09-28 11:08 UTC (permalink / raw
  To: Alex Williamson
  Cc: Yishai Hadas, Jason Gunthorpe, kvm, kevin.tian, joao.m.martins,
	maorg

On Wed, Sep 27, 2023 at 04:10:23PM -0600, Alex Williamson wrote:
> On Wed, 27 Sep 2023 13:59:06 +0300
> Yishai Hadas <yishaih@nvidia.com> wrote:
> 
> > On 20/09/2023 21:31, Jason Gunthorpe wrote:
> > > On Mon, Sep 11, 2023 at 12:38:47PM +0300, Yishai Hadas wrote:  
> > >> This series adds 'chunk mode' support for mlx5 driver upon the migration
> > >> flow.
> > >>
> > >> Before this series, we were limited to 4GB state size, as of the 4 bytes
> > >> max value based on the device specification for the query/save/load
> > >> commands.
> > >>
> > >> Once the device supports 'chunk mode' the driver can support state size
> > >> which is larger than 4GB.
> > >>
> > >> In that case, the device has the capability to split a single image to
> > >> multiple chunks as long as the software provides a buffer in the minimum
> > >> size reported by the device.
> > >>
> > >> The driver should query for the minimum buffer size required using
> > >> QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
> > >> input, in that case, the output will include both the minimum buffer
> > >> size and also the remaining total size to be reported/used where it will
> > >> be applicable.
> > >>
> > >> Upon chunk mode, there may be multiple images that will be read from the
> > >> device upon STOP_COPY. The driver will read ahead from the firmware the
> > >> full state in small/optimized chunks while letting QEMU/user space read
> > >> in parallel the available data.
> > >>
> > >> The chunk buffer size is picked up based on the minimum size that
> > >> firmware requires, the total full size and some max value in the driver
> > >> code which was set to 8MB to achieve some optimized downtime in the
> > >> general case.
> > >>
> > >> With that series in place, we could migrate successfully a device state
> > >> with a larger size than 4GB, while even improving the downtime in some
> > >> scenarios.
> > >>
> > >> Note:
> > >> As the first patch should go to net/mlx5 we may need to send it as a
> > >> pull request format to VFIO to avoid conflicts before acceptance.
> > >>
> > >> Yishai
> > >>
> > >> Yishai Hadas (9):
> > >>    net/mlx5: Introduce ifc bits for migration in a chunk mode
> > >>    vfio/mlx5: Wake up the reader post of disabling the SAVING migration
> > >>      file
> > >>    vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
> > >>      error
> > >>    vfio/mlx5: Enable querying state size which is > 4GB
> > >>    vfio/mlx5: Rename some stuff to match chunk mode
> > >>    vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
> > >>    vfio/mlx5: Add support for SAVING in chunk mode
> > >>    vfio/mlx5: Add support for READING in chunk mode
> > >>    vfio/mlx5: Activate the chunk mode functionality  
> > > I didn't check in great depth but this looks OK to me
> > >
> > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>  
> > 
> > Thanks Jason
> > 
> > >
> > > I think this is a good design to start motivating more qmeu
> > > improvements, eg using io_uring as we could go further in the driver
> > > to optimize with that kind of support.
> > >
> > > Jason  
> > 
> > Alex,
> > 
> > Can we move forward with the series and send a PR for the first patch 
> > that needs to go also to net/mlx5 ?
> 
> Yeah, I don't spot any issues with it either.  Thanks,

Hi Alex,

I uploaded the first patch to shared branch, can you please pull it?
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vfio

Thanks

> 
> Alex
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH vfio 0/9] Add chunk mode support for mlx5 driver
  2023-09-28 11:08       ` Leon Romanovsky
@ 2023-09-28 18:29         ` Alex Williamson
  2023-09-28 18:42           ` Leon Romanovsky
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Williamson @ 2023-09-28 18:29 UTC (permalink / raw
  To: Leon Romanovsky
  Cc: Yishai Hadas, Jason Gunthorpe, kvm, kevin.tian, joao.m.martins,
	maorg

On Thu, 28 Sep 2023 14:08:08 +0300
Leon Romanovsky <leon@kernel.org> wrote:

> On Wed, Sep 27, 2023 at 04:10:23PM -0600, Alex Williamson wrote:
> > On Wed, 27 Sep 2023 13:59:06 +0300
> > Yishai Hadas <yishaih@nvidia.com> wrote:
> >   
> > > On 20/09/2023 21:31, Jason Gunthorpe wrote:  
> > > > On Mon, Sep 11, 2023 at 12:38:47PM +0300, Yishai Hadas wrote:    
> > > >> This series adds 'chunk mode' support for mlx5 driver upon the migration
> > > >> flow.
> > > >>
> > > >> Before this series, we were limited to 4GB state size, as of the 4 bytes
> > > >> max value based on the device specification for the query/save/load
> > > >> commands.
> > > >>
> > > >> Once the device supports 'chunk mode' the driver can support state size
> > > >> which is larger than 4GB.
> > > >>
> > > >> In that case, the device has the capability to split a single image to
> > > >> multiple chunks as long as the software provides a buffer in the minimum
> > > >> size reported by the device.
> > > >>
> > > >> The driver should query for the minimum buffer size required using
> > > >> QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
> > > >> input, in that case, the output will include both the minimum buffer
> > > >> size and also the remaining total size to be reported/used where it will
> > > >> be applicable.
> > > >>
> > > >> Upon chunk mode, there may be multiple images that will be read from the
> > > >> device upon STOP_COPY. The driver will read ahead from the firmware the
> > > >> full state in small/optimized chunks while letting QEMU/user space read
> > > >> in parallel the available data.
> > > >>
> > > >> The chunk buffer size is picked up based on the minimum size that
> > > >> firmware requires, the total full size and some max value in the driver
> > > >> code which was set to 8MB to achieve some optimized downtime in the
> > > >> general case.
> > > >>
> > > >> With that series in place, we could migrate successfully a device state
> > > >> with a larger size than 4GB, while even improving the downtime in some
> > > >> scenarios.
> > > >>
> > > >> Note:
> > > >> As the first patch should go to net/mlx5 we may need to send it as a
> > > >> pull request format to VFIO to avoid conflicts before acceptance.
> > > >>
> > > >> Yishai
> > > >>
> > > >> Yishai Hadas (9):
> > > >>    net/mlx5: Introduce ifc bits for migration in a chunk mode
> > > >>    vfio/mlx5: Wake up the reader post of disabling the SAVING migration
> > > >>      file
> > > >>    vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
> > > >>      error
> > > >>    vfio/mlx5: Enable querying state size which is > 4GB
> > > >>    vfio/mlx5: Rename some stuff to match chunk mode
> > > >>    vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
> > > >>    vfio/mlx5: Add support for SAVING in chunk mode
> > > >>    vfio/mlx5: Add support for READING in chunk mode
> > > >>    vfio/mlx5: Activate the chunk mode functionality    
> > > > I didn't check in great depth but this looks OK to me
> > > >
> > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>    
> > > 
> > > Thanks Jason
> > >   
> > > >
> > > > I think this is a good design to start motivating more qmeu
> > > > improvements, eg using io_uring as we could go further in the driver
> > > > to optimize with that kind of support.
> > > >
> > > > Jason    
> > > 
> > > Alex,
> > > 
> > > Can we move forward with the series and send a PR for the first patch 
> > > that needs to go also to net/mlx5 ?  
> > 
> > Yeah, I don't spot any issues with it either.  Thanks,  
> 
> Hi Alex,
> 
> I uploaded the first patch to shared branch, can you please pull it?
> https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vfio

Yep, got it.  Thanks.

Yishai, were you planning to resend the remainder or do you just want
me to pull 2-9 from this series?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH vfio 0/9] Add chunk mode support for mlx5 driver
  2023-09-28 18:29         ` Alex Williamson
@ 2023-09-28 18:42           ` Leon Romanovsky
  2023-09-28 18:47             ` Alex Williamson
  0 siblings, 1 reply; 20+ messages in thread
From: Leon Romanovsky @ 2023-09-28 18:42 UTC (permalink / raw
  To: Alex Williamson
  Cc: Yishai Hadas, Jason Gunthorpe, kvm, kevin.tian, joao.m.martins,
	maorg

On Thu, Sep 28, 2023 at 12:29:52PM -0600, Alex Williamson wrote:
> On Thu, 28 Sep 2023 14:08:08 +0300
> Leon Romanovsky <leon@kernel.org> wrote:
> 
> > On Wed, Sep 27, 2023 at 04:10:23PM -0600, Alex Williamson wrote:
> > > On Wed, 27 Sep 2023 13:59:06 +0300
> > > Yishai Hadas <yishaih@nvidia.com> wrote:
> > >   
> > > > On 20/09/2023 21:31, Jason Gunthorpe wrote:  
> > > > > On Mon, Sep 11, 2023 at 12:38:47PM +0300, Yishai Hadas wrote:    
> > > > >> This series adds 'chunk mode' support for mlx5 driver upon the migration
> > > > >> flow.
> > > > >>
> > > > >> Before this series, we were limited to 4GB state size, as of the 4 bytes
> > > > >> max value based on the device specification for the query/save/load
> > > > >> commands.
> > > > >>
> > > > >> Once the device supports 'chunk mode' the driver can support state size
> > > > >> which is larger than 4GB.
> > > > >>
> > > > >> In that case, the device has the capability to split a single image to
> > > > >> multiple chunks as long as the software provides a buffer in the minimum
> > > > >> size reported by the device.
> > > > >>
> > > > >> The driver should query for the minimum buffer size required using
> > > > >> QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
> > > > >> input, in that case, the output will include both the minimum buffer
> > > > >> size and also the remaining total size to be reported/used where it will
> > > > >> be applicable.
> > > > >>
> > > > >> Upon chunk mode, there may be multiple images that will be read from the
> > > > >> device upon STOP_COPY. The driver will read ahead from the firmware the
> > > > >> full state in small/optimized chunks while letting QEMU/user space read
> > > > >> in parallel the available data.
> > > > >>
> > > > >> The chunk buffer size is picked up based on the minimum size that
> > > > >> firmware requires, the total full size and some max value in the driver
> > > > >> code which was set to 8MB to achieve some optimized downtime in the
> > > > >> general case.
> > > > >>
> > > > >> With that series in place, we could migrate successfully a device state
> > > > >> with a larger size than 4GB, while even improving the downtime in some
> > > > >> scenarios.
> > > > >>
> > > > >> Note:
> > > > >> As the first patch should go to net/mlx5 we may need to send it as a
> > > > >> pull request format to VFIO to avoid conflicts before acceptance.
> > > > >>
> > > > >> Yishai
> > > > >>
> > > > >> Yishai Hadas (9):
> > > > >>    net/mlx5: Introduce ifc bits for migration in a chunk mode
> > > > >>    vfio/mlx5: Wake up the reader post of disabling the SAVING migration
> > > > >>      file
> > > > >>    vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
> > > > >>      error
> > > > >>    vfio/mlx5: Enable querying state size which is > 4GB
> > > > >>    vfio/mlx5: Rename some stuff to match chunk mode
> > > > >>    vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
> > > > >>    vfio/mlx5: Add support for SAVING in chunk mode
> > > > >>    vfio/mlx5: Add support for READING in chunk mode
> > > > >>    vfio/mlx5: Activate the chunk mode functionality    
> > > > > I didn't check in great depth but this looks OK to me
> > > > >
> > > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>    
> > > > 
> > > > Thanks Jason
> > > >   
> > > > >
> > > > > I think this is a good design to start motivating more qmeu
> > > > > improvements, eg using io_uring as we could go further in the driver
> > > > > to optimize with that kind of support.
> > > > >
> > > > > Jason    
> > > > 
> > > > Alex,
> > > > 
> > > > Can we move forward with the series and send a PR for the first patch 
> > > > that needs to go also to net/mlx5 ?  
> > > 
> > > Yeah, I don't spot any issues with it either.  Thanks,  
> > 
> > Hi Alex,
> > 
> > I uploaded the first patch to shared branch, can you please pull it?
> > https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vfio
> 
> Yep, got it.  Thanks.
> 
> Yishai, were you planning to resend the remainder or do you just want
> me to pull 2-9 from this series?  Thanks,

Just pull, like I did with b4 :)

~/src/b4/b4.sh shazam -l -s https://lore.kernel.org/kvm/20230911093856.81910-1-yishaih@nvidia.com/ -P 2-9 -t

Thanks

> 
> Alex
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH vfio 0/9] Add chunk mode support for mlx5 driver
  2023-09-28 18:42           ` Leon Romanovsky
@ 2023-09-28 18:47             ` Alex Williamson
  2023-09-28 18:51               ` Leon Romanovsky
  0 siblings, 1 reply; 20+ messages in thread
From: Alex Williamson @ 2023-09-28 18:47 UTC (permalink / raw
  To: Leon Romanovsky
  Cc: Yishai Hadas, Jason Gunthorpe, kvm, kevin.tian, joao.m.martins,
	maorg

On Thu, 28 Sep 2023 21:42:22 +0300
Leon Romanovsky <leon@kernel.org> wrote:

> On Thu, Sep 28, 2023 at 12:29:52PM -0600, Alex Williamson wrote:
> > On Thu, 28 Sep 2023 14:08:08 +0300
> > Leon Romanovsky <leon@kernel.org> wrote:
> >   
> > > On Wed, Sep 27, 2023 at 04:10:23PM -0600, Alex Williamson wrote:  
> > > > On Wed, 27 Sep 2023 13:59:06 +0300
> > > > Yishai Hadas <yishaih@nvidia.com> wrote:
> > > >     
> > > > > On 20/09/2023 21:31, Jason Gunthorpe wrote:    
> > > > > > On Mon, Sep 11, 2023 at 12:38:47PM +0300, Yishai Hadas wrote:      
> > > > > >> This series adds 'chunk mode' support for mlx5 driver upon the migration
> > > > > >> flow.
> > > > > >>
> > > > > >> Before this series, we were limited to 4GB state size, as of the 4 bytes
> > > > > >> max value based on the device specification for the query/save/load
> > > > > >> commands.
> > > > > >>
> > > > > >> Once the device supports 'chunk mode' the driver can support state size
> > > > > >> which is larger than 4GB.
> > > > > >>
> > > > > >> In that case, the device has the capability to split a single image to
> > > > > >> multiple chunks as long as the software provides a buffer in the minimum
> > > > > >> size reported by the device.
> > > > > >>
> > > > > >> The driver should query for the minimum buffer size required using
> > > > > >> QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
> > > > > >> input, in that case, the output will include both the minimum buffer
> > > > > >> size and also the remaining total size to be reported/used where it will
> > > > > >> be applicable.
> > > > > >>
> > > > > >> Upon chunk mode, there may be multiple images that will be read from the
> > > > > >> device upon STOP_COPY. The driver will read ahead from the firmware the
> > > > > >> full state in small/optimized chunks while letting QEMU/user space read
> > > > > >> in parallel the available data.
> > > > > >>
> > > > > >> The chunk buffer size is picked up based on the minimum size that
> > > > > >> firmware requires, the total full size and some max value in the driver
> > > > > >> code which was set to 8MB to achieve some optimized downtime in the
> > > > > >> general case.
> > > > > >>
> > > > > >> With that series in place, we could migrate successfully a device state
> > > > > >> with a larger size than 4GB, while even improving the downtime in some
> > > > > >> scenarios.
> > > > > >>
> > > > > >> Note:
> > > > > >> As the first patch should go to net/mlx5 we may need to send it as a
> > > > > >> pull request format to VFIO to avoid conflicts before acceptance.
> > > > > >>
> > > > > >> Yishai
> > > > > >>
> > > > > >> Yishai Hadas (9):
> > > > > >>    net/mlx5: Introduce ifc bits for migration in a chunk mode
> > > > > >>    vfio/mlx5: Wake up the reader post of disabling the SAVING migration
> > > > > >>      file
> > > > > >>    vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
> > > > > >>      error
> > > > > >>    vfio/mlx5: Enable querying state size which is > 4GB
> > > > > >>    vfio/mlx5: Rename some stuff to match chunk mode
> > > > > >>    vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
> > > > > >>    vfio/mlx5: Add support for SAVING in chunk mode
> > > > > >>    vfio/mlx5: Add support for READING in chunk mode
> > > > > >>    vfio/mlx5: Activate the chunk mode functionality      
> > > > > > I didn't check in great depth but this looks OK to me
> > > > > >
> > > > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>      
> > > > > 
> > > > > Thanks Jason
> > > > >     
> > > > > >
> > > > > > I think this is a good design to start motivating more qmeu
> > > > > > improvements, eg using io_uring as we could go further in the driver
> > > > > > to optimize with that kind of support.
> > > > > >
> > > > > > Jason      
> > > > > 
> > > > > Alex,
> > > > > 
> > > > > Can we move forward with the series and send a PR for the first patch 
> > > > > that needs to go also to net/mlx5 ?    
> > > > 
> > > > Yeah, I don't spot any issues with it either.  Thanks,    
> > > 
> > > Hi Alex,
> > > 
> > > I uploaded the first patch to shared branch, can you please pull it?
> > > https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vfio  
> > 
> > Yep, got it.  Thanks.
> > 
> > Yishai, were you planning to resend the remainder or do you just want
> > me to pull 2-9 from this series?  Thanks,  
> 
> Just pull, like I did with b4 :)
> 
> ~/src/b4/b4.sh shazam -l -s https://lore.kernel.org/kvm/20230911093856.81910-1-yishaih@nvidia.com/ -P 2-9 -t

Yep, the mechanics were really not the question, I'm just double
checking to avoid any conflicts with a re-post.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH vfio 0/9] Add chunk mode support for mlx5 driver
  2023-09-28 18:47             ` Alex Williamson
@ 2023-09-28 18:51               ` Leon Romanovsky
  2023-09-28 21:12                 ` Alex Williamson
  0 siblings, 1 reply; 20+ messages in thread
From: Leon Romanovsky @ 2023-09-28 18:51 UTC (permalink / raw
  To: Alex Williamson
  Cc: Yishai Hadas, Jason Gunthorpe, kvm, kevin.tian, joao.m.martins,
	maorg

On Thu, Sep 28, 2023 at 12:47:03PM -0600, Alex Williamson wrote:
> On Thu, 28 Sep 2023 21:42:22 +0300
> Leon Romanovsky <leon@kernel.org> wrote:
> 
> > On Thu, Sep 28, 2023 at 12:29:52PM -0600, Alex Williamson wrote:
> > > On Thu, 28 Sep 2023 14:08:08 +0300
> > > Leon Romanovsky <leon@kernel.org> wrote:
> > >   
> > > > On Wed, Sep 27, 2023 at 04:10:23PM -0600, Alex Williamson wrote:  
> > > > > On Wed, 27 Sep 2023 13:59:06 +0300
> > > > > Yishai Hadas <yishaih@nvidia.com> wrote:
> > > > >     
> > > > > > On 20/09/2023 21:31, Jason Gunthorpe wrote:    
> > > > > > > On Mon, Sep 11, 2023 at 12:38:47PM +0300, Yishai Hadas wrote:      
> > > > > > >> This series adds 'chunk mode' support for mlx5 driver upon the migration
> > > > > > >> flow.
> > > > > > >>
> > > > > > >> Before this series, we were limited to 4GB state size, as of the 4 bytes
> > > > > > >> max value based on the device specification for the query/save/load
> > > > > > >> commands.
> > > > > > >>
> > > > > > >> Once the device supports 'chunk mode' the driver can support state size
> > > > > > >> which is larger than 4GB.
> > > > > > >>
> > > > > > >> In that case, the device has the capability to split a single image to
> > > > > > >> multiple chunks as long as the software provides a buffer in the minimum
> > > > > > >> size reported by the device.
> > > > > > >>
> > > > > > >> The driver should query for the minimum buffer size required using
> > > > > > >> QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
> > > > > > >> input, in that case, the output will include both the minimum buffer
> > > > > > >> size and also the remaining total size to be reported/used where it will
> > > > > > >> be applicable.
> > > > > > >>
> > > > > > >> Upon chunk mode, there may be multiple images that will be read from the
> > > > > > >> device upon STOP_COPY. The driver will read ahead from the firmware the
> > > > > > >> full state in small/optimized chunks while letting QEMU/user space read
> > > > > > >> in parallel the available data.
> > > > > > >>
> > > > > > >> The chunk buffer size is picked up based on the minimum size that
> > > > > > >> firmware requires, the total full size and some max value in the driver
> > > > > > >> code which was set to 8MB to achieve some optimized downtime in the
> > > > > > >> general case.
> > > > > > >>
> > > > > > >> With that series in place, we could migrate successfully a device state
> > > > > > >> with a larger size than 4GB, while even improving the downtime in some
> > > > > > >> scenarios.
> > > > > > >>
> > > > > > >> Note:
> > > > > > >> As the first patch should go to net/mlx5 we may need to send it as a
> > > > > > >> pull request format to VFIO to avoid conflicts before acceptance.
> > > > > > >>
> > > > > > >> Yishai
> > > > > > >>
> > > > > > >> Yishai Hadas (9):
> > > > > > >>    net/mlx5: Introduce ifc bits for migration in a chunk mode
> > > > > > >>    vfio/mlx5: Wake up the reader post of disabling the SAVING migration
> > > > > > >>      file
> > > > > > >>    vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
> > > > > > >>      error
> > > > > > >>    vfio/mlx5: Enable querying state size which is > 4GB
> > > > > > >>    vfio/mlx5: Rename some stuff to match chunk mode
> > > > > > >>    vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
> > > > > > >>    vfio/mlx5: Add support for SAVING in chunk mode
> > > > > > >>    vfio/mlx5: Add support for READING in chunk mode
> > > > > > >>    vfio/mlx5: Activate the chunk mode functionality      
> > > > > > > I didn't check in great depth but this looks OK to me
> > > > > > >
> > > > > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>      
> > > > > > 
> > > > > > Thanks Jason
> > > > > >     
> > > > > > >
> > > > > > > I think this is a good design to start motivating more qmeu
> > > > > > > improvements, eg using io_uring as we could go further in the driver
> > > > > > > to optimize with that kind of support.
> > > > > > >
> > > > > > > Jason      
> > > > > > 
> > > > > > Alex,
> > > > > > 
> > > > > > Can we move forward with the series and send a PR for the first patch 
> > > > > > that needs to go also to net/mlx5 ?    
> > > > > 
> > > > > Yeah, I don't spot any issues with it either.  Thanks,    
> > > > 
> > > > Hi Alex,
> > > > 
> > > > I uploaded the first patch to shared branch, can you please pull it?
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vfio  
> > > 
> > > Yep, got it.  Thanks.
> > > 
> > > Yishai, were you planning to resend the remainder or do you just want
> > > me to pull 2-9 from this series?  Thanks,  
> > 
> > Just pull, like I did with b4 :)
> > 
> > ~/src/b4/b4.sh shazam -l -s https://lore.kernel.org/kvm/20230911093856.81910-1-yishaih@nvidia.com/ -P 2-9 -t
> 
> Yep, the mechanics were really not the question, I'm just double
> checking to avoid any conflicts with a re-post.  Thanks,

It is pretty safe to say that he won't re-post. 
He had no plans to resend the series.

Thanks

> 
> Alex
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH vfio 0/9] Add chunk mode support for mlx5 driver
  2023-09-28 18:51               ` Leon Romanovsky
@ 2023-09-28 21:12                 ` Alex Williamson
  0 siblings, 0 replies; 20+ messages in thread
From: Alex Williamson @ 2023-09-28 21:12 UTC (permalink / raw
  To: Leon Romanovsky
  Cc: Yishai Hadas, Jason Gunthorpe, kvm, kevin.tian, joao.m.martins,
	maorg

On Thu, 28 Sep 2023 21:51:02 +0300
Leon Romanovsky <leon@kernel.org> wrote:

> On Thu, Sep 28, 2023 at 12:47:03PM -0600, Alex Williamson wrote:
> > On Thu, 28 Sep 2023 21:42:22 +0300
> > Leon Romanovsky <leon@kernel.org> wrote:
> >   
> > > On Thu, Sep 28, 2023 at 12:29:52PM -0600, Alex Williamson wrote:  
> > > > On Thu, 28 Sep 2023 14:08:08 +0300
> > > > Leon Romanovsky <leon@kernel.org> wrote:
> > > >     
> > > > > On Wed, Sep 27, 2023 at 04:10:23PM -0600, Alex Williamson wrote:    
> > > > > > On Wed, 27 Sep 2023 13:59:06 +0300
> > > > > > Yishai Hadas <yishaih@nvidia.com> wrote:
> > > > > >       
> > > > > > > On 20/09/2023 21:31, Jason Gunthorpe wrote:      
> > > > > > > > On Mon, Sep 11, 2023 at 12:38:47PM +0300, Yishai Hadas wrote:        
> > > > > > > >> This series adds 'chunk mode' support for mlx5 driver upon the migration
> > > > > > > >> flow.
> > > > > > > >>
> > > > > > > >> Before this series, we were limited to 4GB state size, as of the 4 bytes
> > > > > > > >> max value based on the device specification for the query/save/load
> > > > > > > >> commands.
> > > > > > > >>
> > > > > > > >> Once the device supports 'chunk mode' the driver can support state size
> > > > > > > >> which is larger than 4GB.
> > > > > > > >>
> > > > > > > >> In that case, the device has the capability to split a single image to
> > > > > > > >> multiple chunks as long as the software provides a buffer in the minimum
> > > > > > > >> size reported by the device.
> > > > > > > >>
> > > > > > > >> The driver should query for the minimum buffer size required using
> > > > > > > >> QUERY_VHCA_MIGRATION_STATE command with the 'chunk' bit set in its
> > > > > > > >> input, in that case, the output will include both the minimum buffer
> > > > > > > >> size and also the remaining total size to be reported/used where it will
> > > > > > > >> be applicable.
> > > > > > > >>
> > > > > > > >> Upon chunk mode, there may be multiple images that will be read from the
> > > > > > > >> device upon STOP_COPY. The driver will read ahead from the firmware the
> > > > > > > >> full state in small/optimized chunks while letting QEMU/user space read
> > > > > > > >> in parallel the available data.
> > > > > > > >>
> > > > > > > >> The chunk buffer size is picked up based on the minimum size that
> > > > > > > >> firmware requires, the total full size and some max value in the driver
> > > > > > > >> code which was set to 8MB to achieve some optimized downtime in the
> > > > > > > >> general case.
> > > > > > > >>
> > > > > > > >> With that series in place, we could migrate successfully a device state
> > > > > > > >> with a larger size than 4GB, while even improving the downtime in some
> > > > > > > >> scenarios.
> > > > > > > >>
> > > > > > > >> Note:
> > > > > > > >> As the first patch should go to net/mlx5 we may need to send it as a
> > > > > > > >> pull request format to VFIO to avoid conflicts before acceptance.
> > > > > > > >>
> > > > > > > >> Yishai
> > > > > > > >>
> > > > > > > >> Yishai Hadas (9):
> > > > > > > >>    net/mlx5: Introduce ifc bits for migration in a chunk mode
> > > > > > > >>    vfio/mlx5: Wake up the reader post of disabling the SAVING migration
> > > > > > > >>      file
> > > > > > > >>    vfio/mlx5: Refactor the SAVE callback to activate a work only upon an
> > > > > > > >>      error
> > > > > > > >>    vfio/mlx5: Enable querying state size which is > 4GB
> > > > > > > >>    vfio/mlx5: Rename some stuff to match chunk mode
> > > > > > > >>    vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase
> > > > > > > >>    vfio/mlx5: Add support for SAVING in chunk mode
> > > > > > > >>    vfio/mlx5: Add support for READING in chunk mode
> > > > > > > >>    vfio/mlx5: Activate the chunk mode functionality        
> > > > > > > > I didn't check in great depth but this looks OK to me
> > > > > > > >
> > > > > > > > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>        
> > > > > > > 
> > > > > > > Thanks Jason
> > > > > > >       
> > > > > > > >
> > > > > > > > I think this is a good design to start motivating more qmeu
> > > > > > > > improvements, eg using io_uring as we could go further in the driver
> > > > > > > > to optimize with that kind of support.
> > > > > > > >
> > > > > > > > Jason        
> > > > > > > 
> > > > > > > Alex,
> > > > > > > 
> > > > > > > Can we move forward with the series and send a PR for the first patch 
> > > > > > > that needs to go also to net/mlx5 ?      
> > > > > > 
> > > > > > Yeah, I don't spot any issues with it either.  Thanks,      
> > > > > 
> > > > > Hi Alex,
> > > > > 
> > > > > I uploaded the first patch to shared branch, can you please pull it?
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vfio    
> > > > 
> > > > Yep, got it.  Thanks.
> > > > 
> > > > Yishai, were you planning to resend the remainder or do you just want
> > > > me to pull 2-9 from this series?  Thanks,    
> > > 
> > > Just pull, like I did with b4 :)
> > > 
> > > ~/src/b4/b4.sh shazam -l -s https://lore.kernel.org/kvm/20230911093856.81910-1-yishaih@nvidia.com/ -P 2-9 -t  
> > 
> > Yep, the mechanics were really not the question, I'm just double
> > checking to avoid any conflicts with a re-post.  Thanks,  
> 
> It is pretty safe to say that he won't re-post. 
> He had no plans to resend the series.

Ok, applied the remainder of the series to the vfio next branch for
v6.7.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: (subset) [PATCH vfio 0/9] Add chunk mode support for mlx5 driver
  2023-09-11  9:38 [PATCH vfio 0/9] Add chunk mode support for mlx5 driver Yishai Hadas
                   ` (9 preceding siblings ...)
  2023-09-20 18:31 ` [PATCH vfio 0/9] Add chunk mode support for mlx5 driver Jason Gunthorpe
@ 2023-10-02  8:47 ` Leon Romanovsky
  10 siblings, 0 replies; 20+ messages in thread
From: Leon Romanovsky @ 2023-10-02  8:47 UTC (permalink / raw
  To: alex.williamson, Jason Gunthorpe, Yishai Hadas
  Cc: kvm, kevin.tian, joao.m.martins, maorg, Leon Romanovsky


On Mon, 11 Sep 2023 12:38:47 +0300, Yishai Hadas wrote:
> This series adds 'chunk mode' support for mlx5 driver upon the migration
> flow.
> 
> Before this series, we were limited to 4GB state size, as of the 4 bytes
> max value based on the device specification for the query/save/load
> commands.
> 
> [...]

Applied, thanks!

[1/9] net/mlx5: Introduce ifc bits for migration in a chunk mode
      https://git.kernel.org/rdma/rdma/c/5aa4c9608d2d5f

Best regards,
-- 
Leon Romanovsky <leon@kernel.org>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-10-02  8:48 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-11  9:38 [PATCH vfio 0/9] Add chunk mode support for mlx5 driver Yishai Hadas
2023-09-11  9:38 ` [PATCH vfio 1/9] net/mlx5: Introduce ifc bits for migration in a chunk mode Yishai Hadas
2023-09-11  9:38 ` [PATCH vfio 2/9] vfio/mlx5: Wake up the reader post of disabling the SAVING migration file Yishai Hadas
2023-09-11  9:38 ` [PATCH vfio 3/9] vfio/mlx5: Refactor the SAVE callback to activate a work only upon an error Yishai Hadas
2023-09-11  9:38 ` [PATCH vfio 4/9] vfio/mlx5: Enable querying state size which is > 4GB Yishai Hadas
2023-09-11  9:38 ` [PATCH vfio 5/9] vfio/mlx5: Rename some stuff to match chunk mode Yishai Hadas
2023-09-11  9:38 ` [PATCH vfio 6/9] vfio/mlx5: Pre-allocate chunks for the STOP_COPY phase Yishai Hadas
2023-09-11  9:38 ` [PATCH vfio 7/9] vfio/mlx5: Add support for SAVING in chunk mode Yishai Hadas
2023-09-11  9:38 ` [PATCH vfio 8/9] vfio/mlx5: Add support for READING " Yishai Hadas
2023-09-11  9:38 ` [PATCH vfio 9/9] vfio/mlx5: Activate the chunk mode functionality Yishai Hadas
2023-09-20 18:31 ` [PATCH vfio 0/9] Add chunk mode support for mlx5 driver Jason Gunthorpe
2023-09-27 10:59   ` Yishai Hadas
2023-09-27 22:10     ` Alex Williamson
2023-09-28 11:08       ` Leon Romanovsky
2023-09-28 18:29         ` Alex Williamson
2023-09-28 18:42           ` Leon Romanovsky
2023-09-28 18:47             ` Alex Williamson
2023-09-28 18:51               ` Leon Romanovsky
2023-09-28 21:12                 ` Alex Williamson
2023-10-02  8:47 ` (subset) " Leon Romanovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.