[PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy

Linux-Block Archive mirror
 help / color / mirror / Atom feed

* [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy
@ 2022-12-17  3:09 Yu Kuai
  2022-12-17  3:09 ` [PATCH -next 1/4] block/rq_qos: protect 'q->rq_qos' with queue_lock in rq_qos_exit() Yu Kuai
                   ` (4 more replies)
  0 siblings, 5 replies; 12+ messages in thread
From: Yu Kuai @ 2022-12-17  3:09 UTC (permalink / raw
  To: tj, hch, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

iocost is initialized when it's configured the first time, and iocost
initializing can race with del_gendisk(), which will cause null pointer
dereference:

t1				t2
ioc_qos_write
 blk_iocost_init
  rq_qos_add
  				del_gendisk
  				 rq_qos_exit
  				 //iocost is removed from q->roqs
  blkcg_activate_policy
   pd_init_fn
    ioc_pd_init
     ioc = q_to_ioc(blkg->q)
     //can't find iocost and return null

And iolatency is about to switch to the same lazy initialization.

This patchset fix this problem by synchronize rq_qos_add() and
blkcg_activate_policy() with rq_qos_exit().

Yu Kuai (4):
  block/rq_qos: protect 'q->rq_qos' with queue_lock in rq_qos_exit()
  block/rq_qos: fail rq_qos_add() after del_gendisk()
  blk-cgroup: add a new interface blkcg_conf_close_bdev()
  blk-cgroup: synchronize del_gendisk() with configuring cgroup policy

 block/blk-cgroup.c     | 12 ++++++++++--
 block/blk-cgroup.h     |  1 +
 block/blk-iocost.c     |  8 ++++----
 block/blk-rq-qos.c     | 25 ++++++++++++++++++++-----
 block/blk-rq-qos.h     | 17 +++++++++++++----
 include/linux/blkdev.h |  1 +
 6 files changed, 49 insertions(+), 15 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH -next 1/4] block/rq_qos: protect 'q->rq_qos' with queue_lock in rq_qos_exit()
  2022-12-17  3:09 [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy Yu Kuai
@ 2022-12-17  3:09 ` Yu Kuai
  2022-12-21 10:34   ` Christoph Hellwig
  2022-12-17  3:09 ` [PATCH -next 2/4] block/rq_qos: fail rq_qos_add() after del_gendisk() Yu Kuai
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 12+ messages in thread
From: Yu Kuai @ 2022-12-17  3:09 UTC (permalink / raw
  To: tj, hch, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

queue_lock is held to protect 'q->rqos' in rq_qos_add() and
rq_qos_del(), however, it's not held in rq_qos_exit(), while they
can operate the list concurrently:

t1			t2
// configure iocost	//remove device
blk_iocost_init		del_gendisk
 rq_qos_add		 rq_qos_exit

'rqos->ops->exit' can't be called under spinlock because it might
be scheduled out, hence fix the problem by holding queue_lock to
fetch the list and reset q->rq_qos first.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-rq-qos.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 88f0fe7dcf54..efffc6fa55db 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -288,9 +288,16 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data,
 
 void rq_qos_exit(struct request_queue *q)
 {
-	while (q->rq_qos) {
-		struct rq_qos *rqos = q->rq_qos;
-		q->rq_qos = rqos->next;
-		rqos->ops->exit(rqos);
-	}
+	struct rq_qos *rqos;
+
+	spin_lock_irq(&q->queue_lock);
+	rqos = q->rq_qos;
+	q->rq_qos = NULL;
+	spin_unlock_irq(&q->queue_lock);
+
+	do {
+		if (rqos->ops->exit)
+			rqos->ops->exit(rqos);
+		rqos = rqos->next;
+	} while (rqos);
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH -next 2/4] block/rq_qos: fail rq_qos_add() after del_gendisk()
  2022-12-17  3:09 [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy Yu Kuai
  2022-12-17  3:09 ` [PATCH -next 1/4] block/rq_qos: protect 'q->rq_qos' with queue_lock in rq_qos_exit() Yu Kuai
@ 2022-12-17  3:09 ` Yu Kuai
  2022-12-17  3:09 ` [PATCH -next 3/4] blk-cgroup: add a new interface blkcg_conf_close_bdev() Yu Kuai
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 12+ messages in thread
From: Yu Kuai @ 2022-12-17  3:09 UTC (permalink / raw
  To: tj, hch, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

rq_qos_add() can still succeed after rq_qos_exit() is done, which will
cause memory leak because such rq_qos will never be removed.

t1                      t2
// configure iocost
blk_iocost_init
			//remove device
			del_gendisk
			 rq_qos_exit
			 // done nothing because rq_qos doesn't exist
 rq_qos_add
 // will succeed, and rq_qos won't be removed

rq_qos_add() will also be called from blkcg_init_disk() from
__alloc_disk_node(), and GD_DEAD can distinguish between disk
allocation and disk deletion. Fix the problem by checking GD_DEAD in
rq_qos_add().

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-rq-qos.h | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
index 1ef1f7d4bc3c..81e087139ffb 100644
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -87,6 +87,7 @@ static inline void rq_wait_init(struct rq_wait *rq_wait)
 
 static inline int rq_qos_add(struct request_queue *q, struct rq_qos *rqos)
 {
+	int err;
 	/*
 	 * No IO can be in-flight when adding rqos, so freeze queue, which
 	 * is fine since we only support rq_qos for blk-mq queue.
@@ -97,8 +98,16 @@ static inline int rq_qos_add(struct request_queue *q, struct rq_qos *rqos)
 	blk_mq_freeze_queue(q);
 
 	spin_lock_irq(&q->queue_lock);
-	if (rq_qos_id(q, rqos->id))
-		goto ebusy;
+	if (rq_qos_id(q, rqos->id)) {
+		err = -EBUSY;
+		goto err;
+	}
+
+	if (q->disk && test_bit(GD_DEAD, &q->disk->state)) {
+		err = -ENODEV;
+		goto err;
+	}
+
 	rqos->next = q->rq_qos;
 	q->rq_qos = rqos;
 	spin_unlock_irq(&q->queue_lock);
@@ -112,10 +121,10 @@ static inline int rq_qos_add(struct request_queue *q, struct rq_qos *rqos)
 	}
 
 	return 0;
-ebusy:
+err:
 	spin_unlock_irq(&q->queue_lock);
 	blk_mq_unfreeze_queue(q);
-	return -EBUSY;
+	return err;
 
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH -next 3/4] blk-cgroup: add a new interface blkcg_conf_close_bdev()
  2022-12-17  3:09 [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy Yu Kuai
  2022-12-17  3:09 ` [PATCH -next 1/4] block/rq_qos: protect 'q->rq_qos' with queue_lock in rq_qos_exit() Yu Kuai
  2022-12-17  3:09 ` [PATCH -next 2/4] block/rq_qos: fail rq_qos_add() after del_gendisk() Yu Kuai
@ 2022-12-17  3:09 ` Yu Kuai
  2022-12-17  3:09 ` [PATCH -next 4/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy Yu Kuai
  2022-12-19 20:55 ` [PATCH -next 0/4] " Tejun Heo
  4 siblings, 0 replies; 12+ messages in thread
From: Yu Kuai @ 2022-12-17  3:09 UTC (permalink / raw
  To: tj, hch, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

It's the same as blkdev_put_no_open() for now, prepare to synchronize
del_gendisk() with configuring cgroup policy. There are no functional
changes.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-cgroup.c | 9 +++++++--
 block/blk-cgroup.h | 1 +
 block/blk-iocost.c | 8 ++++----
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 77f44472b41e..ad612148cf3b 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -662,6 +662,11 @@ struct block_device *blkcg_conf_open_bdev(char **inputp)
 	return bdev;
 }
 
+void blkcg_conf_close_bdev(struct block_device *bdev)
+{
+	blkdev_put_no_open(bdev);
+}
+
 /**
  * blkg_conf_prep - parse and prepare for per-blkg config update
  * @blkcg: target block cgroup
@@ -781,7 +786,7 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
 fail_exit_queue:
 	blk_queue_exit(q);
 fail:
-	blkdev_put_no_open(bdev);
+	blkcg_conf_close_bdev(bdev);
 	/*
 	 * If queue was bypassing, we should retry.  Do so after a
 	 * short msleep().  It isn't strictly necessary but queue
@@ -808,7 +813,7 @@ void blkg_conf_finish(struct blkg_conf_ctx *ctx)
 {
 	spin_unlock_irq(&bdev_get_queue(ctx->bdev)->queue_lock);
 	rcu_read_unlock();
-	blkdev_put_no_open(ctx->bdev);
+	blkcg_conf_close_bdev(ctx->bdev);
 }
 EXPORT_SYMBOL_GPL(blkg_conf_finish);
 
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 1e94e404eaa8..d4ae1e7288c1 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -214,6 +214,7 @@ struct blkg_conf_ctx {
 };
 
 struct block_device *blkcg_conf_open_bdev(char **inputp);
+void blkcg_conf_close_bdev(struct block_device *bdev);
 int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
 		   char *input, struct blkg_conf_ctx *ctx);
 void blkg_conf_finish(struct blkg_conf_ctx *ctx);
diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index d1bdc12deaa7..5294a404c892 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -3288,7 +3288,7 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
 	blk_mq_unquiesce_queue(disk->queue);
 	blk_mq_unfreeze_queue(disk->queue);
 
-	blkdev_put_no_open(bdev);
+	blkcg_conf_close_bdev(bdev);
 	return nbytes;
 einval:
 	spin_unlock_irq(&ioc->lock);
@@ -3298,7 +3298,7 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
 
 	ret = -EINVAL;
 err:
-	blkdev_put_no_open(bdev);
+	blkcg_conf_close_bdev(bdev);
 	return ret;
 }
 
@@ -3424,7 +3424,7 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
 	blk_mq_unquiesce_queue(q);
 	blk_mq_unfreeze_queue(q);
 
-	blkdev_put_no_open(bdev);
+	blkcg_conf_close_bdev(bdev);
 	return nbytes;
 
 einval:
@@ -3435,7 +3435,7 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
 
 	ret = -EINVAL;
 err:
-	blkdev_put_no_open(bdev);
+	blkcg_conf_close_bdev(bdev);
 	return ret;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH -next 4/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy
  2022-12-17  3:09 [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy Yu Kuai
                   ` (2 preceding siblings ...)
  2022-12-17  3:09 ` [PATCH -next 3/4] blk-cgroup: add a new interface blkcg_conf_close_bdev() Yu Kuai
@ 2022-12-17  3:09 ` Yu Kuai
  2022-12-19 20:55 ` [PATCH -next 0/4] " Tejun Heo
  4 siblings, 0 replies; 12+ messages in thread
From: Yu Kuai @ 2022-12-17  3:09 UTC (permalink / raw
  To: tj, hch, josef, axboe
  Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang

From: Yu Kuai <yukuai3@huawei.com>

iocost is initialized when it's configured the first time, and iocost
initializing can race with del_gendisk(), which will cause null pointer
dereference:

t1				t2
ioc_qos_write
 blk_iocost_init
  rq_qos_add
  				del_gendisk
  				 rq_qos_exit
  				 //iocost is removed from q->roqs
  blkcg_activate_policy
   pd_init_fn
    ioc_pd_init
     ioc = q_to_ioc(blkg->q)
     //can't find iocost and return null

Fix the problem by adding a new mutex in request_queue, and use it to
synchronize rq_qos_exit() from del_gendisk() with configuring cgroup
policy.

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-cgroup.c     | 3 +++
 block/blk-rq-qos.c     | 8 ++++++++
 include/linux/blkdev.h | 1 +
 3 files changed, 12 insertions(+)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index ad612148cf3b..8dcdaacb52a1 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -658,12 +658,14 @@ struct block_device *blkcg_conf_open_bdev(char **inputp)
 		return ERR_PTR(-ENODEV);
 	}
 
+	mutex_lock(&bdev->bd_queue->blkcg_pols_lock);
 	*inputp = input;
 	return bdev;
 }
 
 void blkcg_conf_close_bdev(struct block_device *bdev)
 {
+	mutex_unlock(&bdev->bd_queue->blkcg_pols_lock);
 	blkdev_put_no_open(bdev);
 }
 
@@ -1277,6 +1279,7 @@ int blkcg_init_disk(struct gendisk *disk)
 	int ret;
 
 	INIT_LIST_HEAD(&q->blkg_list);
+	mutex_init(&q->blkcg_pols_lock);
 
 	new_blkg = blkg_alloc(&blkcg_root, disk, GFP_KERNEL);
 	if (!new_blkg)
diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index efffc6fa55db..86bccdfa1a43 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -290,6 +290,10 @@ void rq_qos_exit(struct request_queue *q)
 {
 	struct rq_qos *rqos;
 
+#ifdef CONFIG_BLK_CGROUP
+	mutex_lock(&q->blkcg_pols_lock);
+#endif
+
 	spin_lock_irq(&q->queue_lock);
 	rqos = q->rq_qos;
 	q->rq_qos = NULL;
@@ -300,4 +304,8 @@ void rq_qos_exit(struct request_queue *q)
 			rqos->ops->exit(rqos);
 		rqos = rqos->next;
 	} while (rqos);
+
+#ifdef CONFIG_BLK_CGROUP
+	mutex_unlock(&q->blkcg_pols_lock);
+#endif
 }
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 301cf1cf4f2f..824d68a41a83 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -484,6 +484,7 @@ struct request_queue {
 	DECLARE_BITMAP		(blkcg_pols, BLKCG_MAX_POLS);
 	struct blkcg_gq		*root_blkg;
 	struct list_head	blkg_list;
+	struct mutex		blkcg_pols_lock;
 #endif
 
 	struct queue_limits	limits;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy
  2022-12-17  3:09 [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy Yu Kuai
                   ` (3 preceding siblings ...)
  2022-12-17  3:09 ` [PATCH -next 4/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy Yu Kuai
@ 2022-12-19 20:55 ` Tejun Heo
  2022-12-20  9:19   ` Yu Kuai
  4 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2022-12-19 20:55 UTC (permalink / raw
  To: Yu Kuai
  Cc: hch, josef, axboe, cgroups, linux-block, linux-kernel, yukuai3,
	yi.zhang

Hello,

On Sat, Dec 17, 2022 at 11:09:04AM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
> 
> iocost is initialized when it's configured the first time, and iocost
> initializing can race with del_gendisk(), which will cause null pointer
> dereference:
> 
> t1				t2
> ioc_qos_write
>  blk_iocost_init
>   rq_qos_add
>   				del_gendisk
>   				 rq_qos_exit
>   				 //iocost is removed from q->roqs
>   blkcg_activate_policy
>    pd_init_fn
>     ioc_pd_init
>      ioc = q_to_ioc(blkg->q)
>      //can't find iocost and return null
> 
> And iolatency is about to switch to the same lazy initialization.
> 
> This patchset fix this problem by synchronize rq_qos_add() and
> blkcg_activate_policy() with rq_qos_exit().

So, the patchset seems a bit overly complicated to me. What do you think
about the following?

* These init/exit paths are super cold path, just protecting them with a
  global mutex is probably enough. If we encounter a scalability problem,
  it's easy to fix down the line.

* If we're synchronizing this with a mutex anyway, no need to grab the
  queue_lock, right? rq_qos_add/del/exit() can all just hold the mutex.

* And we can keep the state tracking within rq_qos. When rq_qos_exit() is
  called, mark it so that future adds will fail - be that a special ->next
  value a queue flag or whatever.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy
  2022-12-19 20:55 ` [PATCH -next 0/4] " Tejun Heo
@ 2022-12-20  9:19   ` Yu Kuai
  2022-12-20 16:01     ` Tejun Heo
  2022-12-21 10:37     ` Christoph Hellwig
  0 siblings, 2 replies; 12+ messages in thread
From: Yu Kuai @ 2022-12-20  9:19 UTC (permalink / raw
  To: Tejun Heo, Yu Kuai
  Cc: hch, josef, axboe, cgroups, linux-block, linux-kernel, yi.zhang,
	yukuai (C)

Hi,

在 2022/12/20 4:55, Tejun Heo 写道:
> Hello,
> 
> On Sat, Dec 17, 2022 at 11:09:04AM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> iocost is initialized when it's configured the first time, and iocost
>> initializing can race with del_gendisk(), which will cause null pointer
>> dereference:
>>
>> t1				t2
>> ioc_qos_write
>>   blk_iocost_init
>>    rq_qos_add
>>    				del_gendisk
>>    				 rq_qos_exit
>>    				 //iocost is removed from q->roqs
>>    blkcg_activate_policy
>>     pd_init_fn
>>      ioc_pd_init
>>       ioc = q_to_ioc(blkg->q)
>>       //can't find iocost and return null
>>
>> And iolatency is about to switch to the same lazy initialization.
>>
>> This patchset fix this problem by synchronize rq_qos_add() and
>> blkcg_activate_policy() with rq_qos_exit().
> 
> So, the patchset seems a bit overly complicated to me. What do you think
> about the following?
> 
> * These init/exit paths are super cold path, just protecting them with a
>    global mutex is probably enough. If we encounter a scalability problem,
>    it's easy to fix down the line.
> 
> * If we're synchronizing this with a mutex anyway, no need to grab the
>    queue_lock, right? rq_qos_add/del/exit() can all just hold the mutex.
> 
> * And we can keep the state tracking within rq_qos. When rq_qos_exit() is
>    called, mark it so that future adds will fail - be that a special ->next
>    value a queue flag or whatever.

Yes, that sounds good. BTW, queue_lock is also used to protect
pd_alloc_fn/pd_init_fn，and we found that blkcg_activate_policy() is
problematic:

blkcg_activate_policy
  spin_lock_irq(&q->queue_lock);
  list_for_each_entry_reverse(blkg, &q->blkg_list
   pd_alloc_fn(GFP_NOWAIT | __GFP_NOWARN,...) -> failed

   spin_unlock_irq(&q->queue_lock);
   // release queue_lock here is problematic, this will cause
pd_offline_fn called without pd_init_fn.
   pd_alloc_fn(__GFP_NOWARN,...)

If we are using a mutex to protect rq_qos ops, it seems the right thing
to do do also using the mutex to protect blkcg_policy ops, and this
problem can be fixed because mutex can be held to alloc memroy with
GFP_KERNEL. What do you think?

Thanks,
Kuai
> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy
  2022-12-20  9:19   ` Yu Kuai
@ 2022-12-20 16:01     ` Tejun Heo
  2022-12-21  1:10       ` Yu Kuai
  2022-12-21 10:37     ` Christoph Hellwig
  1 sibling, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2022-12-20 16:01 UTC (permalink / raw
  To: Yu Kuai
  Cc: hch, josef, axboe, cgroups, linux-block, linux-kernel, yi.zhang,
	yukuai (C)

Hello,

On Tue, Dec 20, 2022 at 05:19:12PM +0800, Yu Kuai wrote:
> Yes, that sounds good. BTW, queue_lock is also used to protect
> pd_alloc_fn/pd_init_fn，and we found that blkcg_activate_policy() is
> problematic:
> 
> blkcg_activate_policy
>  spin_lock_irq(&q->queue_lock);
>  list_for_each_entry_reverse(blkg, &q->blkg_list
>   pd_alloc_fn(GFP_NOWAIT | __GFP_NOWARN,...) -> failed
> 
>   spin_unlock_irq(&q->queue_lock);
>   // release queue_lock here is problematic, this will cause
> pd_offline_fn called without pd_init_fn.
>   pd_alloc_fn(__GFP_NOWARN,...)

So, if a blkg is destroyed while a policy is being activated, right?

> If we are using a mutex to protect rq_qos ops, it seems the right thing
> to do do also using the mutex to protect blkcg_policy ops, and this
> problem can be fixed because mutex can be held to alloc memroy with
> GFP_KERNEL. What do you think?

One worry is that switching to mutex can be more headache due to destroy
path synchronization. Another approach would be using a per-blkg flag to
track whether a blkg has been initialized.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy
  2022-12-20 16:01     ` Tejun Heo
@ 2022-12-21  1:10       ` Yu Kuai
  2022-12-21  2:48         ` Yu Kuai
  0 siblings, 1 reply; 12+ messages in thread
From: Yu Kuai @ 2022-12-21  1:10 UTC (permalink / raw
  To: Tejun Heo, Yu Kuai
  Cc: hch, josef, axboe, cgroups, linux-block, linux-kernel, yi.zhang,
	yukuai (C)

Hi,

在 2022/12/21 0:01, Tejun Heo 写道:
> Hello,
> 
> On Tue, Dec 20, 2022 at 05:19:12PM +0800, Yu Kuai wrote:
>> Yes, that sounds good. BTW, queue_lock is also used to protect
>> pd_alloc_fn/pd_init_fn，and we found that blkcg_activate_policy() is
>> problematic:
>>
>> blkcg_activate_policy
>>   spin_lock_irq(&q->queue_lock);
>>   list_for_each_entry_reverse(blkg, &q->blkg_list
>>    pd_alloc_fn(GFP_NOWAIT | __GFP_NOWARN,...) -> failed
>>
>>    spin_unlock_irq(&q->queue_lock);
>>    // release queue_lock here is problematic, this will cause
>> pd_offline_fn called without pd_init_fn.
>>    pd_alloc_fn(__GFP_NOWARN,...)
> 
> So, if a blkg is destroyed while a policy is being activated, right?
Yes, remove cgroup can race with this, for bfq null pointer deference
will be triggered in bfq_pd_offline().

> 
>> If we are using a mutex to protect rq_qos ops, it seems the right thing
>> to do do also using the mutex to protect blkcg_policy ops, and this
>> problem can be fixed because mutex can be held to alloc memroy with
>> GFP_KERNEL. What do you think?
> 
> One worry is that switching to mutex can be more headache due to destroy
> path synchronization. Another approach would be using a per-blkg flag to
> track whether a blkg has been initialized.
I think perhaps you mean per blkg_policy_data flag? per blkg flag should
not work in this case.

Thanks,
Kuai
> 
> Thanks.
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy
  2022-12-21  1:10       ` Yu Kuai
@ 2022-12-21  2:48         ` Yu Kuai
  0 siblings, 0 replies; 12+ messages in thread
From: Yu Kuai @ 2022-12-21  2:48 UTC (permalink / raw
  To: Yu Kuai, Tejun Heo
  Cc: hch, josef, axboe, cgroups, linux-block, linux-kernel, yi.zhang,
	yukuai (C)

Hi,

在 2022/12/21 9:10, Yu Kuai 写道:
> Hi,
> 
> 在 2022/12/21 0:01, Tejun Heo 写道:
>> Hello,
>>
>> On Tue, Dec 20, 2022 at 05:19:12PM +0800, Yu Kuai wrote:
>>> Yes, that sounds good. BTW, queue_lock is also used to protect
>>> pd_alloc_fn/pd_init_fn，and we found that blkcg_activate_policy() is
>>> problematic:
>>>
>>> blkcg_activate_policy
>>>   spin_lock_irq(&q->queue_lock);
>>>   list_for_each_entry_reverse(blkg, &q->blkg_list
>>>    pd_alloc_fn(GFP_NOWAIT | __GFP_NOWARN,...) -> failed
>>>
>>>    spin_unlock_irq(&q->queue_lock);
>>>    // release queue_lock here is problematic, this will cause
>>> pd_offline_fn called without pd_init_fn.
>>>    pd_alloc_fn(__GFP_NOWARN,...)
>>
>> So, if a blkg is destroyed while a policy is being activated, right?
> Yes, remove cgroup can race with this, for bfq null pointer deference
> will be triggered in bfq_pd_offline().

BTW, We just found that pd_online_fn() is missed in
blkcg_activate_policy()... Currently this is not a problem because only
bl-throttle implement it, and blk-throttle is activated while creating
blkg.

Thanks,
Kuai
> 
>>
>>> If we are using a mutex to protect rq_qos ops, it seems the right thing
>>> to do do also using the mutex to protect blkcg_policy ops, and this
>>> problem can be fixed because mutex can be held to alloc memroy with
>>> GFP_KERNEL. What do you think?
>>
>> One worry is that switching to mutex can be more headache due to destroy
>> path synchronization. Another approach would be using a per-blkg flag to
>> track whether a blkg has been initialized.
> I think perhaps you mean per blkg_policy_data flag? per blkg flag should
> not work in this case.
> 
> Thanks,
> Kuai
>>
>> Thanks.
>>
> 
> .
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -next 1/4] block/rq_qos: protect 'q->rq_qos' with queue_lock in rq_qos_exit()
  2022-12-17  3:09 ` [PATCH -next 1/4] block/rq_qos: protect 'q->rq_qos' with queue_lock in rq_qos_exit() Yu Kuai
@ 2022-12-21 10:34   ` Christoph Hellwig
  0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2022-12-21 10:34 UTC (permalink / raw
  To: Yu Kuai
  Cc: tj, hch, josef, axboe, cgroups, linux-block, linux-kernel,
	yukuai3, yi.zhang

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy
  2022-12-20  9:19   ` Yu Kuai
  2022-12-20 16:01     ` Tejun Heo
@ 2022-12-21 10:37     ` Christoph Hellwig
  1 sibling, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2022-12-21 10:37 UTC (permalink / raw
  To: Yu Kuai
  Cc: Tejun Heo, hch, josef, axboe, cgroups, linux-block, linux-kernel,
	yi.zhang, yukuai (C)

On Tue, Dec 20, 2022 at 05:19:12PM +0800, Yu Kuai wrote:
> If we are using a mutex to protect rq_qos ops, it seems the right thing
> to do do also using the mutex to protect blkcg_policy ops, and this
> problem can be fixed because mutex can be held to alloc memroy with
> GFP_KERNEL. What do you think?

Getting rid of the atomic allocations would be awesome.

FYI, I'm also in favor of everything that moves things out of
queue_lock into more dedicated locks.  queue_lock is such an undocument
mess of untargeted things that don't realted to each other right now
that splitting and removing it is becoming more and more important.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-12-21 10:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-17  3:09 [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy Yu Kuai
2022-12-17  3:09 ` [PATCH -next 1/4] block/rq_qos: protect 'q->rq_qos' with queue_lock in rq_qos_exit() Yu Kuai
2022-12-21 10:34   ` Christoph Hellwig
2022-12-17  3:09 ` [PATCH -next 2/4] block/rq_qos: fail rq_qos_add() after del_gendisk() Yu Kuai
2022-12-17  3:09 ` [PATCH -next 3/4] blk-cgroup: add a new interface blkcg_conf_close_bdev() Yu Kuai
2022-12-17  3:09 ` [PATCH -next 4/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy Yu Kuai
2022-12-19 20:55 ` [PATCH -next 0/4] " Tejun Heo
2022-12-20  9:19   ` Yu Kuai
2022-12-20 16:01     ` Tejun Heo
2022-12-21  1:10       ` Yu Kuai
2022-12-21  2:48         ` Yu Kuai
2022-12-21 10:37     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).