[PATCH v4 0/7] Submit zoned writes in order

Linux-Block Archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 0/7] Submit zoned writes in order
@ 2023-06-21 20:12 Bart Van Assche
  2023-06-21 20:12 ` [PATCH v4 1/7] block: Rename a local variable in blk_mq_requeue_work() Bart Van Assche
                   ` (6 more replies)
  0 siblings, 7 replies; 24+ messages in thread
From: Bart Van Assche @ 2023-06-21 20:12 UTC (permalink / raw
  To: Jens Axboe; +Cc: linux-block, Christoph Hellwig, Bart Van Assche

Hi Jens,

Tests with a zoned UFS prototype have shown that the block layer may reorder
zoned writes (REQ_OP_WRITE). The UFS driver is more likely to trigger reordering
than other SCSI drivers because it reports BLK_STS_DEV_RESOURCE more often, e.g.
during clock scaling. This patch series makes sure that zoned writes are
submitted in order without affecting other workloads significantly.

Please consider this patch series for the next merge window.

Thanks,

Bart.

Changes compared to v3:
- Dropped the changes for sending already dispatched requests back to the I/O
  scheduler if the block driver is busy.
- Dropped several patches that aren't needed to preserve the order of zoned
  writes.
- Only send requeued writes to the I/O scheduler. Send requeued reads straight
  to the dispatch queue.
- Changed the approach from one requeue list per request queue into one requeue
  list per hardware queue.

Changes compared to v2:
- Changed the approach from one requeue list per hctx into preserving one
  requeue list per request queue.
- Rebased on top of Jens' for-next branch. Left out the mq-deadline patches
  since these are already in the for-next branch.
- Modified patch "block: Requeue requests if a CPU is unplugged" such that it
  always uses the requeue list.
- Added a patch that removes blk_mq_kick_requeue_list() and
  blk_mq_delay_kick_requeue_list().
- Dropped patch "block: mq-deadline: Disable head insertion for zoned writes".
- Dropped patch "block: mq-deadline: Introduce a local variable".

Changes compared to v1:
- Fixed two issues detected by the kernel test robot.

Bart Van Assche (7):
  block: Rename a local variable in blk_mq_requeue_work()
  block: Simplify blk_mq_requeue_work()
  block: Send requeued requests to the I/O scheduler
  block: One requeue list per hctx
  block: Preserve the order of requeued requests
  dm: Inline __dm_mq_kick_requeue_list()
  block: Inline blk_mq_{,delay_}kick_requeue_list()

 block/blk-flush.c            | 28 +++++++------
 block/blk-mq-debugfs.c       | 66 ++++++++++++++---------------
 block/blk-mq.c               | 81 ++++++++++++++----------------------
 drivers/block/ublk_drv.c     |  6 +--
 drivers/block/xen-blkfront.c |  1 -
 drivers/md/dm-rq.c           | 11 ++---
 drivers/nvme/host/core.c     |  2 +-
 drivers/s390/block/scm_blk.c |  2 +-
 drivers/scsi/scsi_lib.c      |  2 +-
 include/linux/blk-mq.h       | 11 +++--
 include/linux/blkdev.h       |  5 ---
 11 files changed, 96 insertions(+), 119 deletions(-)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 1/7] block: Rename a local variable in blk_mq_requeue_work()
  2023-06-21 20:12 [PATCH v4 0/7] Submit zoned writes in order Bart Van Assche
@ 2023-06-21 20:12 ` Bart Van Assche
  2023-06-22  1:15   ` Damien Le Moal
  2023-06-21 20:12 ` [PATCH v4 2/7] block: Simplify blk_mq_requeue_work() Bart Van Assche
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2023-06-21 20:12 UTC (permalink / raw
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche,
	Johannes Thumshirn, Damien Le Moal, Ming Lei, Mike Snitzer

Two data structures in blk_mq_requeue_work() represent request lists. Make
it clear that rq_list holds requests that come from the requeue list by
renaming that data structure.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 720b5061ffe8..41ee393c80a9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1441,17 +1441,17 @@ static void blk_mq_requeue_work(struct work_struct *work)
 {
 	struct request_queue *q =
 		container_of(work, struct request_queue, requeue_work.work);
-	LIST_HEAD(rq_list);
+	LIST_HEAD(requeue_list);
 	LIST_HEAD(flush_list);
 	struct request *rq;
 
 	spin_lock_irq(&q->requeue_lock);
-	list_splice_init(&q->requeue_list, &rq_list);
+	list_splice_init(&q->requeue_list, &requeue_list);
 	list_splice_init(&q->flush_list, &flush_list);
 	spin_unlock_irq(&q->requeue_lock);
 
-	while (!list_empty(&rq_list)) {
-		rq = list_entry(rq_list.next, struct request, queuelist);
+	while (!list_empty(&requeue_list)) {
+		rq = list_entry(requeue_list.next, struct request, queuelist);
 		/*
 		 * If RQF_DONTPREP ist set, the request has been started by the
 		 * driver already and might have driver-specific data allocated

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 2/7] block: Simplify blk_mq_requeue_work()
  2023-06-21 20:12 [PATCH v4 0/7] Submit zoned writes in order Bart Van Assche
  2023-06-21 20:12 ` [PATCH v4 1/7] block: Rename a local variable in blk_mq_requeue_work() Bart Van Assche
@ 2023-06-21 20:12 ` Bart Van Assche
  2023-06-22  1:16   ` Damien Le Moal
  2023-06-23  5:45   ` Christoph Hellwig
  2023-06-21 20:12 ` [PATCH v4 3/7] block: Send requeued requests to the I/O scheduler Bart Van Assche
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 24+ messages in thread
From: Bart Van Assche @ 2023-06-21 20:12 UTC (permalink / raw
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Damien Le Moal,
	Ming Lei, Mike Snitzer

Move common code in front of the if-statement.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 41ee393c80a9..f440e4aaaae3 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1458,13 +1458,11 @@ static void blk_mq_requeue_work(struct work_struct *work)
 		 * already.  Insert it into the hctx dispatch list to avoid
 		 * block layer merges for the request.
 		 */
-		if (rq->rq_flags & RQF_DONTPREP) {
-			list_del_init(&rq->queuelist);
+		list_del_init(&rq->queuelist);
+		if (rq->rq_flags & RQF_DONTPREP)
 			blk_mq_request_bypass_insert(rq, 0);
-		} else {
-			list_del_init(&rq->queuelist);
+		else
 			blk_mq_insert_request(rq, BLK_MQ_INSERT_AT_HEAD);
-		}
 	}
 
 	while (!list_empty(&flush_list)) {

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 3/7] block: Send requeued requests to the I/O scheduler
  2023-06-21 20:12 [PATCH v4 0/7] Submit zoned writes in order Bart Van Assche
  2023-06-21 20:12 ` [PATCH v4 1/7] block: Rename a local variable in blk_mq_requeue_work() Bart Van Assche
  2023-06-21 20:12 ` [PATCH v4 2/7] block: Simplify blk_mq_requeue_work() Bart Van Assche
@ 2023-06-21 20:12 ` Bart Van Assche
  2023-06-22  1:19   ` Damien Le Moal
  2023-06-21 20:12 ` [PATCH v4 4/7] block: One requeue list per hctx Bart Van Assche
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2023-06-21 20:12 UTC (permalink / raw
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Damien Le Moal,
	Ming Lei, Mike Snitzer

Send requeued requests to the I/O scheduler if the dispatch order
matters such that the I/O scheduler can control the order in which
requests are dispatched.

This patch reworks commit aef1897cd36d ("blk-mq: insert rq with DONTPREP
to hctx dispatch list when requeue"). Instead of sending DONTPREP
requests to the dispatch list, send these to the I/O scheduler and
prevent that the I/O scheduler merges these requests by adding
RQF_DONTPREP to the list of flags that prevent merging
(RQF_NOMERGE_FLAGS).

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c         | 10 +++++-----
 include/linux/blk-mq.h |  4 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index f440e4aaaae3..453a90767f7a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1453,13 +1453,13 @@ static void blk_mq_requeue_work(struct work_struct *work)
 	while (!list_empty(&requeue_list)) {
 		rq = list_entry(requeue_list.next, struct request, queuelist);
 		/*
-		 * If RQF_DONTPREP ist set, the request has been started by the
-		 * driver already and might have driver-specific data allocated
-		 * already.  Insert it into the hctx dispatch list to avoid
-		 * block layer merges for the request.
+		 * Only send those RQF_DONTPREP requests to the dispatch list
+		 * that may be reordered freely. If the request order matters,
+		 * send the request to the I/O scheduler.
 		 */
 		list_del_init(&rq->queuelist);
-		if (rq->rq_flags & RQF_DONTPREP)
+		if (rq->rq_flags & RQF_DONTPREP &&
+		    !op_needs_zoned_write_locking(req_op(rq)))
 			blk_mq_request_bypass_insert(rq, 0);
 		else
 			blk_mq_insert_request(rq, BLK_MQ_INSERT_AT_HEAD);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index f401067ac03a..2610b299ec77 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -62,8 +62,8 @@ typedef __u32 __bitwise req_flags_t;
 #define RQF_RESV		((__force req_flags_t)(1 << 23))
 
 /* flags that prevent us from merging requests: */
-#define RQF_NOMERGE_FLAGS \
-	(RQF_STARTED | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD)
+#define RQF_NOMERGE_FLAGS                                               \
+	(RQF_STARTED | RQF_FLUSH_SEQ | RQF_DONTPREP | RQF_SPECIAL_PAYLOAD)
 
 enum mq_rq_state {
 	MQ_RQ_IDLE		= 0,

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 4/7] block: One requeue list per hctx
  2023-06-21 20:12 [PATCH v4 0/7] Submit zoned writes in order Bart Van Assche
                   ` (2 preceding siblings ...)
  2023-06-21 20:12 ` [PATCH v4 3/7] block: Send requeued requests to the I/O scheduler Bart Van Assche
@ 2023-06-21 20:12 ` Bart Van Assche
  2023-06-23  5:48   ` Christoph Hellwig
  2023-06-26  8:09   ` Ming Lei
  2023-06-21 20:12 ` [PATCH v4 5/7] block: Preserve the order of requeued requests Bart Van Assche
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 24+ messages in thread
From: Bart Van Assche @ 2023-06-21 20:12 UTC (permalink / raw
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Damien Le Moal,
	Ming Lei, Mike Snitzer

Prepare for processing the requeue list from inside __blk_mq_run_hw_queue().

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-flush.c      | 24 ++++++++--------
 block/blk-mq-debugfs.c | 64 +++++++++++++++++++++---------------------
 block/blk-mq.c         | 53 ++++++++++++++++++++--------------
 include/linux/blk-mq.h |  6 ++++
 include/linux/blkdev.h |  5 ----
 5 files changed, 83 insertions(+), 69 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index dba392cf22be..4bfb92f58aa9 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -91,7 +91,7 @@ enum {
 	FLUSH_PENDING_TIMEOUT	= 5 * HZ,
 };
 
-static void blk_kick_flush(struct request_queue *q,
+static void blk_kick_flush(struct blk_mq_hw_ctx *hctx,
 			   struct blk_flush_queue *fq, blk_opf_t flags);
 
 static inline struct blk_flush_queue *
@@ -165,6 +165,7 @@ static void blk_flush_complete_seq(struct request *rq,
 				   unsigned int seq, blk_status_t error)
 {
 	struct request_queue *q = rq->q;
+	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 	struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
 	blk_opf_t cmd_flags;
 
@@ -188,9 +189,9 @@ static void blk_flush_complete_seq(struct request *rq,
 
 	case REQ_FSEQ_DATA:
 		list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
-		spin_lock(&q->requeue_lock);
-		list_add_tail(&rq->queuelist, &q->flush_list);
-		spin_unlock(&q->requeue_lock);
+		spin_lock(&hctx->requeue_lock);
+		list_add_tail(&rq->queuelist, &hctx->flush_list);
+		spin_unlock(&hctx->requeue_lock);
 		blk_mq_kick_requeue_list(q);
 		break;
 
@@ -210,7 +211,7 @@ static void blk_flush_complete_seq(struct request *rq,
 		BUG();
 	}
 
-	blk_kick_flush(q, fq, cmd_flags);
+	blk_kick_flush(hctx, fq, cmd_flags);
 }
 
 static enum rq_end_io_ret flush_end_io(struct request *flush_rq,
@@ -275,7 +276,7 @@ bool is_flush_rq(struct request *rq)
 
 /**
  * blk_kick_flush - consider issuing flush request
- * @q: request_queue being kicked
+ * @hctx: hwq being kicked
  * @fq: flush queue
  * @flags: cmd_flags of the original request
  *
@@ -286,9 +287,10 @@ bool is_flush_rq(struct request *rq)
  * spin_lock_irq(fq->mq_flush_lock)
  *
  */
-static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
-			   blk_opf_t flags)
+static void blk_kick_flush(struct blk_mq_hw_ctx *hctx,
+			   struct blk_flush_queue *fq, blk_opf_t flags)
 {
+	struct request_queue *q = hctx->queue;
 	struct list_head *pending = &fq->flush_queue[fq->flush_pending_idx];
 	struct request *first_rq =
 		list_first_entry(pending, struct request, flush.list);
@@ -348,9 +350,9 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq,
 	smp_wmb();
 	req_ref_set(flush_rq, 1);
 
-	spin_lock(&q->requeue_lock);
-	list_add_tail(&flush_rq->queuelist, &q->flush_list);
-	spin_unlock(&q->requeue_lock);
+	spin_lock(&hctx->requeue_lock);
+	list_add_tail(&flush_rq->queuelist, &hctx->flush_list);
+	spin_unlock(&hctx->requeue_lock);
 
 	blk_mq_kick_requeue_list(q);
 }
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index c3b5930106b2..787bdff3cc64 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -18,37 +18,6 @@ static int queue_poll_stat_show(void *data, struct seq_file *m)
 	return 0;
 }
 
-static void *queue_requeue_list_start(struct seq_file *m, loff_t *pos)
-	__acquires(&q->requeue_lock)
-{
-	struct request_queue *q = m->private;
-
-	spin_lock_irq(&q->requeue_lock);
-	return seq_list_start(&q->requeue_list, *pos);
-}
-
-static void *queue_requeue_list_next(struct seq_file *m, void *v, loff_t *pos)
-{
-	struct request_queue *q = m->private;
-
-	return seq_list_next(v, &q->requeue_list, pos);
-}
-
-static void queue_requeue_list_stop(struct seq_file *m, void *v)
-	__releases(&q->requeue_lock)
-{
-	struct request_queue *q = m->private;
-
-	spin_unlock_irq(&q->requeue_lock);
-}
-
-static const struct seq_operations queue_requeue_list_seq_ops = {
-	.start	= queue_requeue_list_start,
-	.next	= queue_requeue_list_next,
-	.stop	= queue_requeue_list_stop,
-	.show	= blk_mq_debugfs_rq_show,
-};
-
 static int blk_flags_show(struct seq_file *m, const unsigned long flags,
 			  const char *const *flag_name, int flag_name_count)
 {
@@ -157,7 +126,6 @@ static ssize_t queue_state_write(void *data, const char __user *buf,
 
 static const struct blk_mq_debugfs_attr blk_mq_debugfs_queue_attrs[] = {
 	{ "poll_stat", 0400, queue_poll_stat_show },
-	{ "requeue_list", 0400, .seq_ops = &queue_requeue_list_seq_ops },
 	{ "pm_only", 0600, queue_pm_only_show, NULL },
 	{ "state", 0600, queue_state_show, queue_state_write },
 	{ "zone_wlock", 0400, queue_zone_wlock_show, NULL },
@@ -513,6 +481,37 @@ static int hctx_dispatch_busy_show(void *data, struct seq_file *m)
 	return 0;
 }
 
+static void *hctx_requeue_list_start(struct seq_file *m, loff_t *pos)
+	__acquires(&hctx->requeue_lock)
+{
+	struct blk_mq_hw_ctx *hctx = m->private;
+
+	spin_lock_irq(&hctx->requeue_lock);
+	return seq_list_start(&hctx->requeue_list, *pos);
+}
+
+static void *hctx_requeue_list_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct blk_mq_hw_ctx *hctx = m->private;
+
+	return seq_list_next(v, &hctx->requeue_list, pos);
+}
+
+static void hctx_requeue_list_stop(struct seq_file *m, void *v)
+	__releases(&hctx->requeue_lock)
+{
+	struct blk_mq_hw_ctx *hctx = m->private;
+
+	spin_unlock_irq(&hctx->requeue_lock);
+}
+
+static const struct seq_operations hctx_requeue_list_seq_ops = {
+	.start = hctx_requeue_list_start,
+	.next = hctx_requeue_list_next,
+	.stop = hctx_requeue_list_stop,
+	.show = blk_mq_debugfs_rq_show,
+};
+
 #define CTX_RQ_SEQ_OPS(name, type)					\
 static void *ctx_##name##_rq_list_start(struct seq_file *m, loff_t *pos) \
 	__acquires(&ctx->lock)						\
@@ -628,6 +627,7 @@ static const struct blk_mq_debugfs_attr blk_mq_debugfs_hctx_attrs[] = {
 	{"run", 0600, hctx_run_show, hctx_run_write},
 	{"active", 0400, hctx_active_show},
 	{"dispatch_busy", 0400, hctx_dispatch_busy_show},
+	{"requeue_list", 0400, .seq_ops = &hctx_requeue_list_seq_ops},
 	{"type", 0400, hctx_type_show},
 	{},
 };
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 453a90767f7a..c359a28d9b25 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1421,6 +1421,7 @@ static void __blk_mq_requeue_request(struct request *rq)
 void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
 {
 	struct request_queue *q = rq->q;
+	struct blk_mq_hw_ctx *hctx = rq->mq_hctx;
 	unsigned long flags;
 
 	__blk_mq_requeue_request(rq);
@@ -1428,9 +1429,9 @@ void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
 	/* this request will be re-inserted to io scheduler queue */
 	blk_mq_sched_requeue_request(rq);
 
-	spin_lock_irqsave(&q->requeue_lock, flags);
-	list_add_tail(&rq->queuelist, &q->requeue_list);
-	spin_unlock_irqrestore(&q->requeue_lock, flags);
+	spin_lock_irqsave(&hctx->requeue_lock, flags);
+	list_add_tail(&rq->queuelist, &hctx->requeue_list);
+	spin_unlock_irqrestore(&hctx->requeue_lock, flags);
 
 	if (kick_requeue_list)
 		blk_mq_kick_requeue_list(q);
@@ -1439,16 +1440,16 @@ EXPORT_SYMBOL(blk_mq_requeue_request);
 
 static void blk_mq_requeue_work(struct work_struct *work)
 {
-	struct request_queue *q =
-		container_of(work, struct request_queue, requeue_work.work);
+	struct blk_mq_hw_ctx *hctx =
+		container_of(work, struct blk_mq_hw_ctx, requeue_work.work);
 	LIST_HEAD(requeue_list);
 	LIST_HEAD(flush_list);
 	struct request *rq;
 
-	spin_lock_irq(&q->requeue_lock);
-	list_splice_init(&q->requeue_list, &requeue_list);
-	list_splice_init(&q->flush_list, &flush_list);
-	spin_unlock_irq(&q->requeue_lock);
+	spin_lock_irq(&hctx->requeue_lock);
+	list_splice_init(&hctx->requeue_list, &requeue_list);
+	list_splice_init(&hctx->flush_list, &flush_list);
+	spin_unlock_irq(&hctx->requeue_lock);
 
 	while (!list_empty(&requeue_list)) {
 		rq = list_entry(requeue_list.next, struct request, queuelist);
@@ -1471,20 +1472,30 @@ static void blk_mq_requeue_work(struct work_struct *work)
 		blk_mq_insert_request(rq, 0);
 	}
 
-	blk_mq_run_hw_queues(q, false);
+	blk_mq_run_hw_queue(hctx, false);
 }
 
 void blk_mq_kick_requeue_list(struct request_queue *q)
 {
-	kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND, &q->requeue_work, 0);
+	struct blk_mq_hw_ctx *hctx;
+	unsigned long i;
+
+	queue_for_each_hw_ctx(q, hctx, i)
+		kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND,
+					    &hctx->requeue_work, 0);
 }
 EXPORT_SYMBOL(blk_mq_kick_requeue_list);
 
 void blk_mq_delay_kick_requeue_list(struct request_queue *q,
 				    unsigned long msecs)
 {
-	kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND, &q->requeue_work,
-				    msecs_to_jiffies(msecs));
+	struct blk_mq_hw_ctx *hctx;
+	unsigned long i;
+
+	queue_for_each_hw_ctx(q, hctx, i)
+		kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND,
+					    &hctx->requeue_work,
+					    msecs_to_jiffies(msecs));
 }
 EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
 
@@ -3614,6 +3625,11 @@ static int blk_mq_init_hctx(struct request_queue *q,
 		struct blk_mq_tag_set *set,
 		struct blk_mq_hw_ctx *hctx, unsigned hctx_idx)
 {
+	INIT_DELAYED_WORK(&hctx->requeue_work, blk_mq_requeue_work);
+	INIT_LIST_HEAD(&hctx->flush_list);
+	INIT_LIST_HEAD(&hctx->requeue_list);
+	spin_lock_init(&hctx->requeue_lock);
+
 	hctx->queue_num = hctx_idx;
 
 	if (!(hctx->flags & BLK_MQ_F_STACKING))
@@ -4229,11 +4245,6 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
 	blk_mq_update_poll_flag(q);
 
-	INIT_DELAYED_WORK(&q->requeue_work, blk_mq_requeue_work);
-	INIT_LIST_HEAD(&q->flush_list);
-	INIT_LIST_HEAD(&q->requeue_list);
-	spin_lock_init(&q->requeue_lock);
-
 	q->nr_requests = set->queue_depth;
 
 	blk_mq_init_cpu_queues(q, set->nr_hw_queues);
@@ -4782,10 +4793,10 @@ void blk_mq_cancel_work_sync(struct request_queue *q)
 	struct blk_mq_hw_ctx *hctx;
 	unsigned long i;
 
-	cancel_delayed_work_sync(&q->requeue_work);
-
-	queue_for_each_hw_ctx(q, hctx, i)
+	queue_for_each_hw_ctx(q, hctx, i) {
+		cancel_delayed_work_sync(&hctx->requeue_work);
 		cancel_delayed_work_sync(&hctx->run_work);
+	}
 }
 
 static int __init blk_mq_init(void)
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 2610b299ec77..672e8880f9e2 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -308,6 +308,12 @@ struct blk_mq_hw_ctx {
 		unsigned long		state;
 	} ____cacheline_aligned_in_smp;
 
+	struct list_head	flush_list;
+
+	struct list_head	requeue_list;
+	spinlock_t		requeue_lock;
+	struct delayed_work	requeue_work;
+
 	/**
 	 * @run_work: Used for scheduling a hardware queue run at a later time.
 	 */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ed44a997f629..ed4f89657f1f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -479,11 +479,6 @@ struct request_queue {
 	 * for flush operations
 	 */
 	struct blk_flush_queue	*fq;
-	struct list_head	flush_list;
-
-	struct list_head	requeue_list;
-	spinlock_t		requeue_lock;
-	struct delayed_work	requeue_work;
 
 	struct mutex		sysfs_lock;
 	struct mutex		sysfs_dir_lock;

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 5/7] block: Preserve the order of requeued requests
  2023-06-21 20:12 [PATCH v4 0/7] Submit zoned writes in order Bart Van Assche
                   ` (3 preceding siblings ...)
  2023-06-21 20:12 ` [PATCH v4 4/7] block: One requeue list per hctx Bart Van Assche
@ 2023-06-21 20:12 ` Bart Van Assche
  2023-06-23  5:50   ` Christoph Hellwig
  2023-06-26  8:16   ` Ming Lei
  2023-06-21 20:12 ` [PATCH v4 6/7] dm: Inline __dm_mq_kick_requeue_list() Bart Van Assche
  2023-06-21 20:12 ` [PATCH v4 7/7] block: Inline blk_mq_{,delay_}kick_requeue_list() Bart Van Assche
  6 siblings, 2 replies; 24+ messages in thread
From: Bart Van Assche @ 2023-06-21 20:12 UTC (permalink / raw
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Damien Le Moal,
	Ming Lei, Mike Snitzer

If a queue is run before all requeued requests have been sent to the I/O
scheduler, the I/O scheduler may dispatch the wrong request. Fix this by
making blk_mq_run_hw_queue() process the requeue_list instead of
blk_mq_requeue_work().

Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-mq.c         | 31 +++++++++----------------------
 include/linux/blk-mq.h |  1 -
 2 files changed, 9 insertions(+), 23 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index c359a28d9b25..de39984d17c4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -68,6 +68,8 @@ static inline blk_qc_t blk_rq_to_qc(struct request *rq)
 static bool blk_mq_hctx_has_pending(struct blk_mq_hw_ctx *hctx)
 {
 	return !list_empty_careful(&hctx->dispatch) ||
+		!list_empty_careful(&hctx->requeue_list) ||
+		!list_empty_careful(&hctx->flush_list) ||
 		sbitmap_any_bit_set(&hctx->ctx_map) ||
 			blk_mq_sched_has_work(hctx);
 }
@@ -1438,10 +1440,8 @@ void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
 }
 EXPORT_SYMBOL(blk_mq_requeue_request);
 
-static void blk_mq_requeue_work(struct work_struct *work)
+static void blk_mq_process_requeue_list(struct blk_mq_hw_ctx *hctx)
 {
-	struct blk_mq_hw_ctx *hctx =
-		container_of(work, struct blk_mq_hw_ctx, requeue_work.work);
 	LIST_HEAD(requeue_list);
 	LIST_HEAD(flush_list);
 	struct request *rq;
@@ -1471,31 +1471,18 @@ static void blk_mq_requeue_work(struct work_struct *work)
 		list_del_init(&rq->queuelist);
 		blk_mq_insert_request(rq, 0);
 	}
-
-	blk_mq_run_hw_queue(hctx, false);
 }
 
 void blk_mq_kick_requeue_list(struct request_queue *q)
 {
-	struct blk_mq_hw_ctx *hctx;
-	unsigned long i;
-
-	queue_for_each_hw_ctx(q, hctx, i)
-		kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND,
-					    &hctx->requeue_work, 0);
+	blk_mq_run_hw_queues(q, true);
 }
 EXPORT_SYMBOL(blk_mq_kick_requeue_list);
 
 void blk_mq_delay_kick_requeue_list(struct request_queue *q,
 				    unsigned long msecs)
 {
-	struct blk_mq_hw_ctx *hctx;
-	unsigned long i;
-
-	queue_for_each_hw_ctx(q, hctx, i)
-		kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND,
-					    &hctx->requeue_work,
-					    msecs_to_jiffies(msecs));
+	blk_mq_delay_run_hw_queues(q, msecs);
 }
 EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
 
@@ -2248,6 +2235,7 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
 		return;
 	}
 
+	blk_mq_process_requeue_list(hctx);
 	blk_mq_run_dispatch_ops(hctx->queue,
 				blk_mq_sched_dispatch_requests(hctx));
 }
@@ -2296,7 +2284,7 @@ void blk_mq_run_hw_queues(struct request_queue *q, bool async)
 		 * scheduler.
 		 */
 		if (!sq_hctx || sq_hctx == hctx ||
-		    !list_empty_careful(&hctx->dispatch))
+		    blk_mq_hctx_has_pending(hctx))
 			blk_mq_run_hw_queue(hctx, async);
 	}
 }
@@ -2332,7 +2320,7 @@ void blk_mq_delay_run_hw_queues(struct request_queue *q, unsigned long msecs)
 		 * scheduler.
 		 */
 		if (!sq_hctx || sq_hctx == hctx ||
-		    !list_empty_careful(&hctx->dispatch))
+		    blk_mq_hctx_has_pending(hctx))
 			blk_mq_delay_run_hw_queue(hctx, msecs);
 	}
 }
@@ -2417,6 +2405,7 @@ static void blk_mq_run_work_fn(struct work_struct *work)
 	struct blk_mq_hw_ctx *hctx =
 		container_of(work, struct blk_mq_hw_ctx, run_work.work);
 
+	blk_mq_process_requeue_list(hctx);
 	blk_mq_run_dispatch_ops(hctx->queue,
 				blk_mq_sched_dispatch_requests(hctx));
 }
@@ -3625,7 +3614,6 @@ static int blk_mq_init_hctx(struct request_queue *q,
 		struct blk_mq_tag_set *set,
 		struct blk_mq_hw_ctx *hctx, unsigned hctx_idx)
 {
-	INIT_DELAYED_WORK(&hctx->requeue_work, blk_mq_requeue_work);
 	INIT_LIST_HEAD(&hctx->flush_list);
 	INIT_LIST_HEAD(&hctx->requeue_list);
 	spin_lock_init(&hctx->requeue_lock);
@@ -4794,7 +4782,6 @@ void blk_mq_cancel_work_sync(struct request_queue *q)
 	unsigned long i;
 
 	queue_for_each_hw_ctx(q, hctx, i) {
-		cancel_delayed_work_sync(&hctx->requeue_work);
 		cancel_delayed_work_sync(&hctx->run_work);
 	}
 }
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 672e8880f9e2..b919de53dc28 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -312,7 +312,6 @@ struct blk_mq_hw_ctx {
 
 	struct list_head	requeue_list;
 	spinlock_t		requeue_lock;
-	struct delayed_work	requeue_work;
 
 	/**
 	 * @run_work: Used for scheduling a hardware queue run at a later time.

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 6/7] dm: Inline __dm_mq_kick_requeue_list()
  2023-06-21 20:12 [PATCH v4 0/7] Submit zoned writes in order Bart Van Assche
                   ` (4 preceding siblings ...)
  2023-06-21 20:12 ` [PATCH v4 5/7] block: Preserve the order of requeued requests Bart Van Assche
@ 2023-06-21 20:12 ` Bart Van Assche
  2023-06-21 20:12 ` [PATCH v4 7/7] block: Inline blk_mq_{,delay_}kick_requeue_list() Bart Van Assche
  6 siblings, 0 replies; 24+ messages in thread
From: Bart Van Assche @ 2023-06-21 20:12 UTC (permalink / raw
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Damien Le Moal,
	Ming Lei, Mike Snitzer, Alasdair Kergon, dm-devel

Since commit 52d7f1b5c2f3 ("blk-mq: Avoid that requeueing starts stopped
queues") the function __dm_mq_kick_requeue_list() is too short to keep it
as a separate function. Hence, inline this function.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/md/dm-rq.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index f7e9a3632eb3..bbe1e2ea0aa4 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -168,21 +168,16 @@ static void dm_end_request(struct request *clone, blk_status_t error)
 	rq_completed(md);
 }
 
-static void __dm_mq_kick_requeue_list(struct request_queue *q, unsigned long msecs)
-{
-	blk_mq_delay_kick_requeue_list(q, msecs);
-}
-
 void dm_mq_kick_requeue_list(struct mapped_device *md)
 {
-	__dm_mq_kick_requeue_list(md->queue, 0);
+	blk_mq_kick_requeue_list(md->queue);
 }
 EXPORT_SYMBOL(dm_mq_kick_requeue_list);
 
 static void dm_mq_delay_requeue_request(struct request *rq, unsigned long msecs)
 {
 	blk_mq_requeue_request(rq, false);
-	__dm_mq_kick_requeue_list(rq->q, msecs);
+	blk_mq_delay_kick_requeue_list(rq->q, msecs);
 }
 
 static void dm_requeue_original_request(struct dm_rq_target_io *tio, bool delay_requeue)

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 7/7] block: Inline blk_mq_{,delay_}kick_requeue_list()
  2023-06-21 20:12 [PATCH v4 0/7] Submit zoned writes in order Bart Van Assche
                   ` (5 preceding siblings ...)
  2023-06-21 20:12 ` [PATCH v4 6/7] dm: Inline __dm_mq_kick_requeue_list() Bart Van Assche
@ 2023-06-21 20:12 ` Bart Van Assche
  2023-06-22  7:31   ` Roger Pau Monné
  2023-06-23  5:52   ` Christoph Hellwig
  6 siblings, 2 replies; 24+ messages in thread
From: Bart Van Assche @ 2023-06-21 20:12 UTC (permalink / raw
  To: Jens Axboe
  Cc: linux-block, Christoph Hellwig, Bart Van Assche, Vineeth Vijayan,
	Damien Le Moal, Ming Lei, Mike Snitzer, Roger Pau Monné,
	Juergen Gross, Stefano Stabellini, Alasdair Kergon, dm-devel,
	Keith Busch, Sagi Grimberg, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, James E.J. Bottomley, Martin K. Petersen

Patch "block: Preserve the order of requeued requests" changed
blk_mq_kick_requeue_list() and blk_mq_delay_kick_requeue_list() into
blk_mq_run_hw_queues() and blk_mq_delay_run_hw_queues() calls
respectively. Inline blk_mq_{,delay_}kick_requeue_list() because these
functions are now too short to keep these as separate functions.

Acked-by: Vineeth Vijayan <vneethv@linux.ibm.com> [ for the s390 changes ]
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <dlemoal@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 block/blk-flush.c            |  4 ++--
 block/blk-mq-debugfs.c       |  2 +-
 block/blk-mq.c               | 15 +--------------
 drivers/block/ublk_drv.c     |  6 +++---
 drivers/block/xen-blkfront.c |  1 -
 drivers/md/dm-rq.c           |  6 +++---
 drivers/nvme/host/core.c     |  2 +-
 drivers/s390/block/scm_blk.c |  2 +-
 drivers/scsi/scsi_lib.c      |  2 +-
 include/linux/blk-mq.h       |  2 --
 10 files changed, 13 insertions(+), 29 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 4bfb92f58aa9..157b86fd9ccb 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -192,7 +192,7 @@ static void blk_flush_complete_seq(struct request *rq,
 		spin_lock(&hctx->requeue_lock);
 		list_add_tail(&rq->queuelist, &hctx->flush_list);
 		spin_unlock(&hctx->requeue_lock);
-		blk_mq_kick_requeue_list(q);
+		blk_mq_run_hw_queues(q, true);
 		break;
 
 	case REQ_FSEQ_DONE:
@@ -354,7 +354,7 @@ static void blk_kick_flush(struct blk_mq_hw_ctx *hctx,
 	list_add_tail(&flush_rq->queuelist, &hctx->flush_list);
 	spin_unlock(&hctx->requeue_lock);
 
-	blk_mq_kick_requeue_list(q);
+	blk_mq_run_hw_queues(q, true);
 }
 
 static enum rq_end_io_ret mq_flush_data_end_io(struct request *rq,
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 787bdff3cc64..76792ebab935 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -114,7 +114,7 @@ static ssize_t queue_state_write(void *data, const char __user *buf,
 	} else if (strcmp(op, "start") == 0) {
 		blk_mq_start_stopped_hw_queues(q, true);
 	} else if (strcmp(op, "kick") == 0) {
-		blk_mq_kick_requeue_list(q);
+		blk_mq_run_hw_queues(q, true);
 	} else {
 		pr_err("%s: unsupported operation '%s'\n", __func__, op);
 inval:
diff --git a/block/blk-mq.c b/block/blk-mq.c
index de39984d17c4..12fd8b65b930 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1436,7 +1436,7 @@ void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
 	spin_unlock_irqrestore(&hctx->requeue_lock, flags);
 
 	if (kick_requeue_list)
-		blk_mq_kick_requeue_list(q);
+		blk_mq_run_hw_queues(q, true);
 }
 EXPORT_SYMBOL(blk_mq_requeue_request);
 
@@ -1473,19 +1473,6 @@ static void blk_mq_process_requeue_list(struct blk_mq_hw_ctx *hctx)
 	}
 }
 
-void blk_mq_kick_requeue_list(struct request_queue *q)
-{
-	blk_mq_run_hw_queues(q, true);
-}
-EXPORT_SYMBOL(blk_mq_kick_requeue_list);
-
-void blk_mq_delay_kick_requeue_list(struct request_queue *q,
-				    unsigned long msecs)
-{
-	blk_mq_delay_run_hw_queues(q, msecs);
-}
-EXPORT_SYMBOL(blk_mq_delay_kick_requeue_list);
-
 static bool blk_mq_rq_inflight(struct request *rq, void *priv)
 {
 	/*
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 1c823750c95a..cddbbdc9b199 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -902,7 +902,7 @@ static inline void __ublk_rq_task_work(struct request *req,
 		 */
 		if (unlikely(!mapped_bytes)) {
 			blk_mq_requeue_request(req, false);
-			blk_mq_delay_kick_requeue_list(req->q,
+			blk_mq_delay_run_hw_queues(req->q,
 					UBLK_REQUEUE_DELAY_MS);
 			return;
 		}
@@ -1297,7 +1297,7 @@ static void ublk_unquiesce_dev(struct ublk_device *ub)
 
 	blk_mq_unquiesce_queue(ub->ub_disk->queue);
 	/* We may have requeued some rqs in ublk_quiesce_queue() */
-	blk_mq_kick_requeue_list(ub->ub_disk->queue);
+	blk_mq_run_hw_queues(ub->ub_disk->queue, true);
 }
 
 static void ublk_stop_dev(struct ublk_device *ub)
@@ -2341,7 +2341,7 @@ static int ublk_ctrl_end_recovery(struct ublk_device *ub,
 	blk_mq_unquiesce_queue(ub->ub_disk->queue);
 	pr_devel("%s: queue unquiesced, dev id %d.\n",
 			__func__, header->dev_id);
-	blk_mq_kick_requeue_list(ub->ub_disk->queue);
+	blk_mq_run_hw_queues(ub->ub_disk->queue, true);
 	ub->dev_info.state = UBLK_S_DEV_LIVE;
 	schedule_delayed_work(&ub->monitor_work, UBLK_DAEMON_MONITOR_PERIOD);
 	ret = 0;
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 52e74adbaad6..b8ac217c92b6 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -2040,7 +2040,6 @@ static int blkif_recover(struct blkfront_info *info)
 		blk_mq_requeue_request(req, false);
 	}
 	blk_mq_start_stopped_hw_queues(info->rq, true);
-	blk_mq_kick_requeue_list(info->rq);
 
 	while ((bio = bio_list_pop(&info->bio_list)) != NULL) {
 		/* Traverse the list of pending bios and re-queue them */
diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c
index bbe1e2ea0aa4..6421cc2c9852 100644
--- a/drivers/md/dm-rq.c
+++ b/drivers/md/dm-rq.c
@@ -64,7 +64,7 @@ int dm_request_based(struct mapped_device *md)
 void dm_start_queue(struct request_queue *q)
 {
 	blk_mq_unquiesce_queue(q);
-	blk_mq_kick_requeue_list(q);
+	blk_mq_run_hw_queues(q, true);
 }
 
 void dm_stop_queue(struct request_queue *q)
@@ -170,14 +170,14 @@ static void dm_end_request(struct request *clone, blk_status_t error)
 
 void dm_mq_kick_requeue_list(struct mapped_device *md)
 {
-	blk_mq_kick_requeue_list(md->queue);
+	blk_mq_run_hw_queues(md->queue, true);
 }
 EXPORT_SYMBOL(dm_mq_kick_requeue_list);
 
 static void dm_mq_delay_requeue_request(struct request *rq, unsigned long msecs)
 {
 	blk_mq_requeue_request(rq, false);
-	blk_mq_delay_kick_requeue_list(rq->q, msecs);
+	blk_mq_delay_run_hw_queues(rq->q, msecs);
 }
 
 static void dm_requeue_original_request(struct dm_rq_target_io *tio, bool delay_requeue)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index f5dd6d8c7e1d..9b923d52e41c 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -303,7 +303,7 @@ static void nvme_retry_req(struct request *req)
 
 	nvme_req(req)->retries++;
 	blk_mq_requeue_request(req, false);
-	blk_mq_delay_kick_requeue_list(req->q, delay);
+	blk_mq_delay_run_hw_queues(req->q, delay);
 }
 
 static void nvme_log_error(struct request *req)
diff --git a/drivers/s390/block/scm_blk.c b/drivers/s390/block/scm_blk.c
index 0c1df1d5f1ac..fe5937d28fdc 100644
--- a/drivers/s390/block/scm_blk.c
+++ b/drivers/s390/block/scm_blk.c
@@ -243,7 +243,7 @@ static void scm_request_requeue(struct scm_request *scmrq)
 
 	atomic_dec(&bdev->queued_reqs);
 	scm_request_done(scmrq);
-	blk_mq_kick_requeue_list(bdev->rq);
+	blk_mq_run_hw_queues(bdev->rq, true);
 }
 
 static void scm_request_finish(struct scm_request *scmrq)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 0226c9279cef..2aa3c147e12f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -124,7 +124,7 @@ static void scsi_mq_requeue_cmd(struct scsi_cmnd *cmd, unsigned long msecs)
 
 	if (msecs) {
 		blk_mq_requeue_request(rq, false);
-		blk_mq_delay_kick_requeue_list(rq->q, msecs);
+		blk_mq_delay_run_hw_queues(rq->q, msecs);
 	} else
 		blk_mq_requeue_request(rq, true);
 }
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index b919de53dc28..80761e7c4ea5 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -871,8 +871,6 @@ static inline bool blk_mq_add_to_batch(struct request *req,
 }
 
 void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list);
-void blk_mq_kick_requeue_list(struct request_queue *q);
-void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs);
 void blk_mq_complete_request(struct request *rq);
 bool blk_mq_complete_request_remote(struct request *rq);
 void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx);

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 1/7] block: Rename a local variable in blk_mq_requeue_work()
  2023-06-21 20:12 ` [PATCH v4 1/7] block: Rename a local variable in blk_mq_requeue_work() Bart Van Assche
@ 2023-06-22  1:15   ` Damien Le Moal
  0 siblings, 0 replies; 24+ messages in thread
From: Damien Le Moal @ 2023-06-22  1:15 UTC (permalink / raw
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Johannes Thumshirn, Ming Lei,
	Mike Snitzer

On 6/22/23 05:12, Bart Van Assche wrote:
> Two data structures in blk_mq_requeue_work() represent request lists. Make
> it clear that rq_list holds requests that come from the requeue list by
> renaming that data structure.
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/7] block: Simplify blk_mq_requeue_work()
  2023-06-21 20:12 ` [PATCH v4 2/7] block: Simplify blk_mq_requeue_work() Bart Van Assche
@ 2023-06-22  1:16   ` Damien Le Moal
  2023-06-23  5:45   ` Christoph Hellwig
  1 sibling, 0 replies; 24+ messages in thread
From: Damien Le Moal @ 2023-06-22  1:16 UTC (permalink / raw
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Ming Lei, Mike Snitzer

On 6/22/23 05:12, Bart Van Assche wrote:
> Move common code in front of the if-statement.
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 3/7] block: Send requeued requests to the I/O scheduler
  2023-06-21 20:12 ` [PATCH v4 3/7] block: Send requeued requests to the I/O scheduler Bart Van Assche
@ 2023-06-22  1:19   ` Damien Le Moal
  2023-06-22 17:23     ` Bart Van Assche
  0 siblings, 1 reply; 24+ messages in thread
From: Damien Le Moal @ 2023-06-22  1:19 UTC (permalink / raw
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Ming Lei, Mike Snitzer

On 6/22/23 05:12, Bart Van Assche wrote:
> Send requeued requests to the I/O scheduler if the dispatch order
> matters such that the I/O scheduler can control the order in which
> requests are dispatched.
> 
> This patch reworks commit aef1897cd36d ("blk-mq: insert rq with DONTPREP
> to hctx dispatch list when requeue"). Instead of sending DONTPREP
> requests to the dispatch list, send these to the I/O scheduler and
> prevent that the I/O scheduler merges these requests by adding
> RQF_DONTPREP to the list of flags that prevent merging
> (RQF_NOMERGE_FLAGS).
> 
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  block/blk-mq.c         | 10 +++++-----
>  include/linux/blk-mq.h |  4 ++--
>  2 files changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f440e4aaaae3..453a90767f7a 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1453,13 +1453,13 @@ static void blk_mq_requeue_work(struct work_struct *work)
>  	while (!list_empty(&requeue_list)) {
>  		rq = list_entry(requeue_list.next, struct request, queuelist);
>  		/*
> -		 * If RQF_DONTPREP ist set, the request has been started by the
> -		 * driver already and might have driver-specific data allocated
> -		 * already.  Insert it into the hctx dispatch list to avoid
> -		 * block layer merges for the request.
> +		 * Only send those RQF_DONTPREP requests to the dispatch list
> +		 * that may be reordered freely. If the request order matters,
> +		 * send the request to the I/O scheduler.
>  		 */
>  		list_del_init(&rq->queuelist);
> -		if (rq->rq_flags & RQF_DONTPREP)
> +		if (rq->rq_flags & RQF_DONTPREP &&
> +		    !op_needs_zoned_write_locking(req_op(rq)))

Why ? I still do not understand the need for this. There is always only a single
in-flight write per sequential zone. Requeuing that in-flight write directly to
the dispatch list will not reorder writes and it will be better for the command
latency.

>  			blk_mq_request_bypass_insert(rq, 0);
>  		else
>  			blk_mq_insert_request(rq, BLK_MQ_INSERT_AT_HEAD);
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index f401067ac03a..2610b299ec77 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -62,8 +62,8 @@ typedef __u32 __bitwise req_flags_t;
>  #define RQF_RESV		((__force req_flags_t)(1 << 23))
>  
>  /* flags that prevent us from merging requests: */
> -#define RQF_NOMERGE_FLAGS \
> -	(RQF_STARTED | RQF_FLUSH_SEQ | RQF_SPECIAL_PAYLOAD)
> +#define RQF_NOMERGE_FLAGS                                               \
> +	(RQF_STARTED | RQF_FLUSH_SEQ | RQF_DONTPREP | RQF_SPECIAL_PAYLOAD)
>  
>  enum mq_rq_state {
>  	MQ_RQ_IDLE		= 0,

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 7/7] block: Inline blk_mq_{,delay_}kick_requeue_list()
  2023-06-21 20:12 ` [PATCH v4 7/7] block: Inline blk_mq_{,delay_}kick_requeue_list() Bart Van Assche
@ 2023-06-22  7:31   ` Roger Pau Monné
  2023-06-23  5:52   ` Christoph Hellwig
  1 sibling, 0 replies; 24+ messages in thread
From: Roger Pau Monné @ 2023-06-22  7:31 UTC (permalink / raw
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Vineeth Vijayan,
	Damien Le Moal, Ming Lei, Mike Snitzer, Juergen Gross,
	Stefano Stabellini, Alasdair Kergon, dm-devel, Keith Busch,
	Sagi Grimberg, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	James E.J. Bottomley, Martin K. Petersen

On Wed, Jun 21, 2023 at 01:12:34PM -0700, Bart Van Assche wrote:
> Patch "block: Preserve the order of requeued requests" changed
> blk_mq_kick_requeue_list() and blk_mq_delay_kick_requeue_list() into
> blk_mq_run_hw_queues() and blk_mq_delay_run_hw_queues() calls
> respectively. Inline blk_mq_{,delay_}kick_requeue_list() because these
> functions are now too short to keep these as separate functions.
> 
> Acked-by: Vineeth Vijayan <vneethv@linux.ibm.com> [ for the s390 changes ]
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Ming Lei <ming.lei@redhat.com>
> Cc: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>

For the blkfront change:

Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 3/7] block: Send requeued requests to the I/O scheduler
  2023-06-22  1:19   ` Damien Le Moal
@ 2023-06-22 17:23     ` Bart Van Assche
  2023-06-22 21:50       ` Damien Le Moal
  0 siblings, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2023-06-22 17:23 UTC (permalink / raw
  To: Damien Le Moal, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Ming Lei, Mike Snitzer

On 6/21/23 18:19, Damien Le Moal wrote:
> Why ? I still do not understand the need for this. There is always only a single
> in-flight write per sequential zone. Requeuing that in-flight write directly to
> the dispatch list will not reorder writes and it will be better for the command
> latency.
Hi Damien,

After having taken a closer look I see that blk_req_zone_write_unlock() 
is called from inside dd_insert_request() when requeuing a request. 
Hence, there is no reordering risk when requeuing a zoned write. I will 
drop this patch.

Do you agree with having one requeue list per hctx instead of per 
request queue? This change enables eliminating 
blk_mq_kick_requeue_list(). I think that's an interesting simplification 
of the block layer API.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 3/7] block: Send requeued requests to the I/O scheduler
  2023-06-22 17:23     ` Bart Van Assche
@ 2023-06-22 21:50       ` Damien Le Moal
  0 siblings, 0 replies; 24+ messages in thread
From: Damien Le Moal @ 2023-06-22 21:50 UTC (permalink / raw
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block, Christoph Hellwig, Ming Lei, Mike Snitzer

On 6/23/23 02:23, Bart Van Assche wrote:
> On 6/21/23 18:19, Damien Le Moal wrote:
>> Why ? I still do not understand the need for this. There is always only a single
>> in-flight write per sequential zone. Requeuing that in-flight write directly to
>> the dispatch list will not reorder writes and it will be better for the command
>> latency.
> Hi Damien,
> 
> After having taken a closer look I see that blk_req_zone_write_unlock() 
> is called from inside dd_insert_request() when requeuing a request. 
> Hence, there is no reordering risk when requeuing a zoned write. I will 
> drop this patch.

OK. Thanks.

> 
> Do you agree with having one requeue list per hctx instead of per 
> request queue? This change enables eliminating 
> blk_mq_kick_requeue_list(). I think that's an interesting simplification 
> of the block layer API.

I do not see any issue with that. Indeed, it does simplify the code nicely.
Reading patch 5, I wondered though if it is really worth keeping the helpers
blk_mq_kick_requeue_list() and blk_mq_delay_kick_requeue_list(). May be calling
blk_mq_run_hw_queues() and blk_mq_delay_run_hw_queues() is better ? No strong
opinion though.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 2/7] block: Simplify blk_mq_requeue_work()
  2023-06-21 20:12 ` [PATCH v4 2/7] block: Simplify blk_mq_requeue_work() Bart Van Assche
  2023-06-22  1:16   ` Damien Le Moal
@ 2023-06-23  5:45   ` Christoph Hellwig
  1 sibling, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-06-23  5:45 UTC (permalink / raw
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Damien Le Moal,
	Ming Lei, Mike Snitzer

On Wed, Jun 21, 2023 at 01:12:29PM -0700, Bart Van Assche wrote:
> Move common code in front of the if-statement.

This mechanical change looks ok to me, it's more the existing
ad prevailing logic here I'm worried about..

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 4/7] block: One requeue list per hctx
  2023-06-21 20:12 ` [PATCH v4 4/7] block: One requeue list per hctx Bart Van Assche
@ 2023-06-23  5:48   ` Christoph Hellwig
  2023-06-26  8:09   ` Ming Lei
  1 sibling, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-06-23  5:48 UTC (permalink / raw
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Damien Le Moal,
	Ming Lei, Mike Snitzer

>  void blk_mq_kick_requeue_list(struct request_queue *q)
>  {
> -	kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND, &q->requeue_work, 0);
> +	struct blk_mq_hw_ctx *hctx;
> +	unsigned long i;
> +
> +	queue_for_each_hw_ctx(q, hctx, i)
> +		kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND,
> +					    &hctx->requeue_work, 0);
>  }
>  EXPORT_SYMBOL(blk_mq_kick_requeue_list);
>  
>  void blk_mq_delay_kick_requeue_list(struct request_queue *q,
>  				    unsigned long msecs)
>  {
> -	kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND, &q->requeue_work,
> -				    msecs_to_jiffies(msecs));
> +	struct blk_mq_hw_ctx *hctx;
> +	unsigned long i;
> +
> +	queue_for_each_hw_ctx(q, hctx, i)
> +		kblockd_mod_delayed_work_on(WORK_CPU_UNBOUND,
> +					    &hctx->requeue_work,
> +					    msecs_to_jiffies(msecs));
>  }

I'd iplement blk_mq_kick_requeue_list as a wrapper around
blk_mq_delay_kick_requeue_list that passes a 0 msec argument.

Otherwise looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 5/7] block: Preserve the order of requeued requests
  2023-06-21 20:12 ` [PATCH v4 5/7] block: Preserve the order of requeued requests Bart Van Assche
@ 2023-06-23  5:50   ` Christoph Hellwig
  2023-06-23 20:08     ` Bart Van Assche
  2023-06-26  8:16   ` Ming Lei
  1 sibling, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2023-06-23  5:50 UTC (permalink / raw
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Damien Le Moal,
	Ming Lei, Mike Snitzer

> +	blk_mq_process_requeue_list(hctx);

Should this do list_empty_careful checks on ->requeue_list and
->flush_list so that we can avoid taking the requeue lock when
these conditions aren't met before calling into
blk_mq_process_requeue_list?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 7/7] block: Inline blk_mq_{,delay_}kick_requeue_list()
  2023-06-21 20:12 ` [PATCH v4 7/7] block: Inline blk_mq_{,delay_}kick_requeue_list() Bart Van Assche
  2023-06-22  7:31   ` Roger Pau Monné
@ 2023-06-23  5:52   ` Christoph Hellwig
  1 sibling, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-06-23  5:52 UTC (permalink / raw
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Vineeth Vijayan,
	Damien Le Moal, Ming Lei, Mike Snitzer, Roger Pau Monné,
	Juergen Gross, Stefano Stabellini, Alasdair Kergon, dm-devel,
	Keith Busch, Sagi Grimberg, Heiko Carstens, Vasily Gorbik,
	Alexander Gordeev, James E.J. Bottomley, Martin K. Petersen

On Wed, Jun 21, 2023 at 01:12:34PM -0700, Bart Van Assche wrote:
> Patch "block: Preserve the order of requeued requests" changed
> blk_mq_kick_requeue_list() and blk_mq_delay_kick_requeue_list() into
> blk_mq_run_hw_queues() and blk_mq_delay_run_hw_queues() calls
> respectively. Inline blk_mq_{,delay_}kick_requeue_list() because these
> functions are now too short to keep these as separate functions.

You're not inlining them, but you're removing them and open code
the trivial logic in the callers.

The code change looks good to me, though.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 5/7] block: Preserve the order of requeued requests
  2023-06-23  5:50   ` Christoph Hellwig
@ 2023-06-23 20:08     ` Bart Van Assche
  2023-06-26  8:01       ` Christoph Hellwig
  0 siblings, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2023-06-23 20:08 UTC (permalink / raw
  To: Christoph Hellwig
  Cc: Jens Axboe, linux-block, Damien Le Moal, Ming Lei, Mike Snitzer

On 6/22/23 22:50, Christoph Hellwig wrote:
>> +	blk_mq_process_requeue_list(hctx);
> 
> Should this do list_empty_careful checks on ->requeue_list and
> ->flush_list so that we can avoid taking the requeue lock when
> these conditions aren't met before calling into
> blk_mq_process_requeue_list?

Hi Christoph,

I agree that checks whether or not requeue_list and flush_list are empty 
should be added.

Does it matter in this context whether list_empty_careful() or 
list_empty() is used? If blk_mq_process_requeue_list() is called 
concurrently with code that adds an element to one of these two lists it 
is guaranteed that the queue will be run again. This is why I think that 
it is fine that the list checks in blk_mq_process_requeue_list() race 
with concurrent list additions.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 5/7] block: Preserve the order of requeued requests
  2023-06-23 20:08     ` Bart Van Assche
@ 2023-06-26  8:01       ` Christoph Hellwig
  0 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2023-06-26  8:01 UTC (permalink / raw
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, linux-block, Damien Le Moal,
	Ming Lei, Mike Snitzer

On Fri, Jun 23, 2023 at 01:08:33PM -0700, Bart Van Assche wrote:
> Does it matter in this context whether list_empty_careful() or list_empty() 
> is used? If blk_mq_process_requeue_list() is called concurrently with code 
> that adds an element to one of these two lists it is guaranteed that the 
> queue will be run again. This is why I think that it is fine that the list 
> checks in blk_mq_process_requeue_list() race with concurrent list 
> additions.

As far as I can tell we're not holding the relevant lock, so I think we
need list_empty_careful.  Or am I missing something?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 4/7] block: One requeue list per hctx
  2023-06-21 20:12 ` [PATCH v4 4/7] block: One requeue list per hctx Bart Van Assche
  2023-06-23  5:48   ` Christoph Hellwig
@ 2023-06-26  8:09   ` Ming Lei
  2023-06-26 16:54     ` Bart Van Assche
  1 sibling, 1 reply; 24+ messages in thread
From: Ming Lei @ 2023-06-26  8:09 UTC (permalink / raw
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Damien Le Moal,
	Mike Snitzer

On Wed, Jun 21, 2023 at 01:12:31PM -0700, Bart Van Assche wrote:
> Prepare for processing the requeue list from inside __blk_mq_run_hw_queue().

Please add more details about the motivation for this kind of change.

IMO requeue is supposed to be run in slow code path, not see reason why
it has to be moved to hw queue level.

Thanks,
Ming


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 5/7] block: Preserve the order of requeued requests
  2023-06-21 20:12 ` [PATCH v4 5/7] block: Preserve the order of requeued requests Bart Van Assche
  2023-06-23  5:50   ` Christoph Hellwig
@ 2023-06-26  8:16   ` Ming Lei
  1 sibling, 0 replies; 24+ messages in thread
From: Ming Lei @ 2023-06-26  8:16 UTC (permalink / raw
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Damien Le Moal,
	Mike Snitzer

On Wed, Jun 21, 2023 at 01:12:32PM -0700, Bart Van Assche wrote:
> If a queue is run before all requeued requests have been sent to the I/O
> scheduler, the I/O scheduler may dispatch the wrong request. Fix this by

Can you explain in more details about the issue? What is the wrong
request? How?

If you mean write order for requeued write request, there isn't such issue,
given new write request from same zone can be issued before the old
requeued one is completed.

Thanks, 
Ming


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 4/7] block: One requeue list per hctx
  2023-06-26  8:09   ` Ming Lei
@ 2023-06-26 16:54     ` Bart Van Assche
  2023-06-27  1:06       ` Ming Lei
  0 siblings, 1 reply; 24+ messages in thread
From: Bart Van Assche @ 2023-06-26 16:54 UTC (permalink / raw
  To: Ming Lei
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Damien Le Moal,
	Mike Snitzer

On 6/26/23 01:09, Ming Lei wrote:
> On Wed, Jun 21, 2023 at 01:12:31PM -0700, Bart Van Assche wrote:
>> Prepare for processing the requeue list from inside __blk_mq_run_hw_queue().
> 
> Please add more details about the motivation for this kind of change.
> 
> IMO requeue is supposed to be run in slow code path, not see reason why
> it has to be moved to hw queue level.

As Damien explained, the code for sending requeued requests to the I/O
scheduler should be dropped because this is not necessary in order to
prevent reordering of requeued zoned writes.

The remaining advantages of this patch series are:
* Simplification of the block layer API since the
   blk_mq_{,delay_}kick_requeue_list() functions are removed.
* Faster requeuing. Although I don't have any use case that needs
   faster requeuing, it is interesting to see that this patch series
   makes the blktest test that tests requeuing run twice as fast.

Is there agreement to proceed with this patch series if all review
comments are addressed?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 4/7] block: One requeue list per hctx
  2023-06-26 16:54     ` Bart Van Assche
@ 2023-06-27  1:06       ` Ming Lei
  0 siblings, 0 replies; 24+ messages in thread
From: Ming Lei @ 2023-06-27  1:06 UTC (permalink / raw
  To: Bart Van Assche
  Cc: Jens Axboe, linux-block, Christoph Hellwig, Damien Le Moal,
	Mike Snitzer

On Mon, Jun 26, 2023 at 09:54:01AM -0700, Bart Van Assche wrote:
> On 6/26/23 01:09, Ming Lei wrote:
> > On Wed, Jun 21, 2023 at 01:12:31PM -0700, Bart Van Assche wrote:
> > > Prepare for processing the requeue list from inside __blk_mq_run_hw_queue().
> > 
> > Please add more details about the motivation for this kind of change.
> > 
> > IMO requeue is supposed to be run in slow code path, not see reason why
> > it has to be moved to hw queue level.
> 
> As Damien explained, the code for sending requeued requests to the I/O
> scheduler should be dropped because this is not necessary in order to
> prevent reordering of requeued zoned writes.
> 
> The remaining advantages of this patch series are:
> * Simplification of the block layer API since the
>   blk_mq_{,delay_}kick_requeue_list() functions are removed.
> * Faster requeuing. Although I don't have any use case that needs
>   faster requeuing, it is interesting to see that this patch series
>   makes the blktest test that tests requeuing run twice as fast.

Not sure if it is good, cause requeue is supposed to be slow.

> 
> Is there agreement to proceed with this patch series if all review
> comments are addressed?

One concern is that you added requeue stuff into hctx, which may
add one extra L1 cache fetch in fast path.


Thank,
Ming


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-06-27  1:07 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-21 20:12 [PATCH v4 0/7] Submit zoned writes in order Bart Van Assche
2023-06-21 20:12 ` [PATCH v4 1/7] block: Rename a local variable in blk_mq_requeue_work() Bart Van Assche
2023-06-22  1:15   ` Damien Le Moal
2023-06-21 20:12 ` [PATCH v4 2/7] block: Simplify blk_mq_requeue_work() Bart Van Assche
2023-06-22  1:16   ` Damien Le Moal
2023-06-23  5:45   ` Christoph Hellwig
2023-06-21 20:12 ` [PATCH v4 3/7] block: Send requeued requests to the I/O scheduler Bart Van Assche
2023-06-22  1:19   ` Damien Le Moal
2023-06-22 17:23     ` Bart Van Assche
2023-06-22 21:50       ` Damien Le Moal
2023-06-21 20:12 ` [PATCH v4 4/7] block: One requeue list per hctx Bart Van Assche
2023-06-23  5:48   ` Christoph Hellwig
2023-06-26  8:09   ` Ming Lei
2023-06-26 16:54     ` Bart Van Assche
2023-06-27  1:06       ` Ming Lei
2023-06-21 20:12 ` [PATCH v4 5/7] block: Preserve the order of requeued requests Bart Van Assche
2023-06-23  5:50   ` Christoph Hellwig
2023-06-23 20:08     ` Bart Van Assche
2023-06-26  8:01       ` Christoph Hellwig
2023-06-26  8:16   ` Ming Lei
2023-06-21 20:12 ` [PATCH v4 6/7] dm: Inline __dm_mq_kick_requeue_list() Bart Van Assche
2023-06-21 20:12 ` [PATCH v4 7/7] block: Inline blk_mq_{,delay_}kick_requeue_list() Bart Van Assche
2023-06-22  7:31   ` Roger Pau Monné
2023-06-23  5:52   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).