* [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy
@ 2022-12-17 3:09 Yu Kuai
2022-12-17 3:09 ` [PATCH -next 1/4] block/rq_qos: protect 'q->rq_qos' with queue_lock in rq_qos_exit() Yu Kuai
` (4 more replies)
0 siblings, 5 replies; 12+ messages in thread
From: Yu Kuai @ 2022-12-17 3:09 UTC (permalink / raw
To: tj, hch, josef, axboe
Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang
From: Yu Kuai <yukuai3@huawei.com>
iocost is initialized when it's configured the first time, and iocost
initializing can race with del_gendisk(), which will cause null pointer
dereference:
t1 t2
ioc_qos_write
blk_iocost_init
rq_qos_add
del_gendisk
rq_qos_exit
//iocost is removed from q->roqs
blkcg_activate_policy
pd_init_fn
ioc_pd_init
ioc = q_to_ioc(blkg->q)
//can't find iocost and return null
And iolatency is about to switch to the same lazy initialization.
This patchset fix this problem by synchronize rq_qos_add() and
blkcg_activate_policy() with rq_qos_exit().
Yu Kuai (4):
block/rq_qos: protect 'q->rq_qos' with queue_lock in rq_qos_exit()
block/rq_qos: fail rq_qos_add() after del_gendisk()
blk-cgroup: add a new interface blkcg_conf_close_bdev()
blk-cgroup: synchronize del_gendisk() with configuring cgroup policy
block/blk-cgroup.c | 12 ++++++++++--
block/blk-cgroup.h | 1 +
block/blk-iocost.c | 8 ++++----
block/blk-rq-qos.c | 25 ++++++++++++++++++++-----
block/blk-rq-qos.h | 17 +++++++++++++----
include/linux/blkdev.h | 1 +
6 files changed, 49 insertions(+), 15 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH -next 1/4] block/rq_qos: protect 'q->rq_qos' with queue_lock in rq_qos_exit()
2022-12-17 3:09 [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy Yu Kuai
@ 2022-12-17 3:09 ` Yu Kuai
2022-12-21 10:34 ` Christoph Hellwig
2022-12-17 3:09 ` [PATCH -next 2/4] block/rq_qos: fail rq_qos_add() after del_gendisk() Yu Kuai
` (3 subsequent siblings)
4 siblings, 1 reply; 12+ messages in thread
From: Yu Kuai @ 2022-12-17 3:09 UTC (permalink / raw
To: tj, hch, josef, axboe
Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang
From: Yu Kuai <yukuai3@huawei.com>
queue_lock is held to protect 'q->rqos' in rq_qos_add() and
rq_qos_del(), however, it's not held in rq_qos_exit(), while they
can operate the list concurrently:
t1 t2
// configure iocost //remove device
blk_iocost_init del_gendisk
rq_qos_add rq_qos_exit
'rqos->ops->exit' can't be called under spinlock because it might
be scheduled out, hence fix the problem by holding queue_lock to
fetch the list and reset q->rq_qos first.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
block/blk-rq-qos.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 88f0fe7dcf54..efffc6fa55db 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -288,9 +288,16 @@ void rq_qos_wait(struct rq_wait *rqw, void *private_data,
void rq_qos_exit(struct request_queue *q)
{
- while (q->rq_qos) {
- struct rq_qos *rqos = q->rq_qos;
- q->rq_qos = rqos->next;
- rqos->ops->exit(rqos);
- }
+ struct rq_qos *rqos;
+
+ spin_lock_irq(&q->queue_lock);
+ rqos = q->rq_qos;
+ q->rq_qos = NULL;
+ spin_unlock_irq(&q->queue_lock);
+
+ do {
+ if (rqos->ops->exit)
+ rqos->ops->exit(rqos);
+ rqos = rqos->next;
+ } while (rqos);
}
--
2.31.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH -next 2/4] block/rq_qos: fail rq_qos_add() after del_gendisk()
2022-12-17 3:09 [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy Yu Kuai
2022-12-17 3:09 ` [PATCH -next 1/4] block/rq_qos: protect 'q->rq_qos' with queue_lock in rq_qos_exit() Yu Kuai
@ 2022-12-17 3:09 ` Yu Kuai
2022-12-17 3:09 ` [PATCH -next 3/4] blk-cgroup: add a new interface blkcg_conf_close_bdev() Yu Kuai
` (2 subsequent siblings)
4 siblings, 0 replies; 12+ messages in thread
From: Yu Kuai @ 2022-12-17 3:09 UTC (permalink / raw
To: tj, hch, josef, axboe
Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang
From: Yu Kuai <yukuai3@huawei.com>
rq_qos_add() can still succeed after rq_qos_exit() is done, which will
cause memory leak because such rq_qos will never be removed.
t1 t2
// configure iocost
blk_iocost_init
//remove device
del_gendisk
rq_qos_exit
// done nothing because rq_qos doesn't exist
rq_qos_add
// will succeed, and rq_qos won't be removed
rq_qos_add() will also be called from blkcg_init_disk() from
__alloc_disk_node(), and GD_DEAD can distinguish between disk
allocation and disk deletion. Fix the problem by checking GD_DEAD in
rq_qos_add().
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
block/blk-rq-qos.h | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
index 1ef1f7d4bc3c..81e087139ffb 100644
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -87,6 +87,7 @@ static inline void rq_wait_init(struct rq_wait *rq_wait)
static inline int rq_qos_add(struct request_queue *q, struct rq_qos *rqos)
{
+ int err;
/*
* No IO can be in-flight when adding rqos, so freeze queue, which
* is fine since we only support rq_qos for blk-mq queue.
@@ -97,8 +98,16 @@ static inline int rq_qos_add(struct request_queue *q, struct rq_qos *rqos)
blk_mq_freeze_queue(q);
spin_lock_irq(&q->queue_lock);
- if (rq_qos_id(q, rqos->id))
- goto ebusy;
+ if (rq_qos_id(q, rqos->id)) {
+ err = -EBUSY;
+ goto err;
+ }
+
+ if (q->disk && test_bit(GD_DEAD, &q->disk->state)) {
+ err = -ENODEV;
+ goto err;
+ }
+
rqos->next = q->rq_qos;
q->rq_qos = rqos;
spin_unlock_irq(&q->queue_lock);
@@ -112,10 +121,10 @@ static inline int rq_qos_add(struct request_queue *q, struct rq_qos *rqos)
}
return 0;
-ebusy:
+err:
spin_unlock_irq(&q->queue_lock);
blk_mq_unfreeze_queue(q);
- return -EBUSY;
+ return err;
}
--
2.31.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH -next 3/4] blk-cgroup: add a new interface blkcg_conf_close_bdev()
2022-12-17 3:09 [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy Yu Kuai
2022-12-17 3:09 ` [PATCH -next 1/4] block/rq_qos: protect 'q->rq_qos' with queue_lock in rq_qos_exit() Yu Kuai
2022-12-17 3:09 ` [PATCH -next 2/4] block/rq_qos: fail rq_qos_add() after del_gendisk() Yu Kuai
@ 2022-12-17 3:09 ` Yu Kuai
2022-12-17 3:09 ` [PATCH -next 4/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy Yu Kuai
2022-12-19 20:55 ` [PATCH -next 0/4] " Tejun Heo
4 siblings, 0 replies; 12+ messages in thread
From: Yu Kuai @ 2022-12-17 3:09 UTC (permalink / raw
To: tj, hch, josef, axboe
Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang
From: Yu Kuai <yukuai3@huawei.com>
It's the same as blkdev_put_no_open() for now, prepare to synchronize
del_gendisk() with configuring cgroup policy. There are no functional
changes.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
block/blk-cgroup.c | 9 +++++++--
block/blk-cgroup.h | 1 +
block/blk-iocost.c | 8 ++++----
3 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 77f44472b41e..ad612148cf3b 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -662,6 +662,11 @@ struct block_device *blkcg_conf_open_bdev(char **inputp)
return bdev;
}
+void blkcg_conf_close_bdev(struct block_device *bdev)
+{
+ blkdev_put_no_open(bdev);
+}
+
/**
* blkg_conf_prep - parse and prepare for per-blkg config update
* @blkcg: target block cgroup
@@ -781,7 +786,7 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
fail_exit_queue:
blk_queue_exit(q);
fail:
- blkdev_put_no_open(bdev);
+ blkcg_conf_close_bdev(bdev);
/*
* If queue was bypassing, we should retry. Do so after a
* short msleep(). It isn't strictly necessary but queue
@@ -808,7 +813,7 @@ void blkg_conf_finish(struct blkg_conf_ctx *ctx)
{
spin_unlock_irq(&bdev_get_queue(ctx->bdev)->queue_lock);
rcu_read_unlock();
- blkdev_put_no_open(ctx->bdev);
+ blkcg_conf_close_bdev(ctx->bdev);
}
EXPORT_SYMBOL_GPL(blkg_conf_finish);
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index 1e94e404eaa8..d4ae1e7288c1 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -214,6 +214,7 @@ struct blkg_conf_ctx {
};
struct block_device *blkcg_conf_open_bdev(char **inputp);
+void blkcg_conf_close_bdev(struct block_device *bdev);
int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol,
char *input, struct blkg_conf_ctx *ctx);
void blkg_conf_finish(struct blkg_conf_ctx *ctx);
diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index d1bdc12deaa7..5294a404c892 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -3288,7 +3288,7 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
blk_mq_unquiesce_queue(disk->queue);
blk_mq_unfreeze_queue(disk->queue);
- blkdev_put_no_open(bdev);
+ blkcg_conf_close_bdev(bdev);
return nbytes;
einval:
spin_unlock_irq(&ioc->lock);
@@ -3298,7 +3298,7 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
ret = -EINVAL;
err:
- blkdev_put_no_open(bdev);
+ blkcg_conf_close_bdev(bdev);
return ret;
}
@@ -3424,7 +3424,7 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
blk_mq_unquiesce_queue(q);
blk_mq_unfreeze_queue(q);
- blkdev_put_no_open(bdev);
+ blkcg_conf_close_bdev(bdev);
return nbytes;
einval:
@@ -3435,7 +3435,7 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
ret = -EINVAL;
err:
- blkdev_put_no_open(bdev);
+ blkcg_conf_close_bdev(bdev);
return ret;
}
--
2.31.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH -next 4/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy
2022-12-17 3:09 [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy Yu Kuai
` (2 preceding siblings ...)
2022-12-17 3:09 ` [PATCH -next 3/4] blk-cgroup: add a new interface blkcg_conf_close_bdev() Yu Kuai
@ 2022-12-17 3:09 ` Yu Kuai
2022-12-19 20:55 ` [PATCH -next 0/4] " Tejun Heo
4 siblings, 0 replies; 12+ messages in thread
From: Yu Kuai @ 2022-12-17 3:09 UTC (permalink / raw
To: tj, hch, josef, axboe
Cc: cgroups, linux-block, linux-kernel, yukuai3, yukuai1, yi.zhang
From: Yu Kuai <yukuai3@huawei.com>
iocost is initialized when it's configured the first time, and iocost
initializing can race with del_gendisk(), which will cause null pointer
dereference:
t1 t2
ioc_qos_write
blk_iocost_init
rq_qos_add
del_gendisk
rq_qos_exit
//iocost is removed from q->roqs
blkcg_activate_policy
pd_init_fn
ioc_pd_init
ioc = q_to_ioc(blkg->q)
//can't find iocost and return null
Fix the problem by adding a new mutex in request_queue, and use it to
synchronize rq_qos_exit() from del_gendisk() with configuring cgroup
policy.
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
block/blk-cgroup.c | 3 +++
block/blk-rq-qos.c | 8 ++++++++
include/linux/blkdev.h | 1 +
3 files changed, 12 insertions(+)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index ad612148cf3b..8dcdaacb52a1 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -658,12 +658,14 @@ struct block_device *blkcg_conf_open_bdev(char **inputp)
return ERR_PTR(-ENODEV);
}
+ mutex_lock(&bdev->bd_queue->blkcg_pols_lock);
*inputp = input;
return bdev;
}
void blkcg_conf_close_bdev(struct block_device *bdev)
{
+ mutex_unlock(&bdev->bd_queue->blkcg_pols_lock);
blkdev_put_no_open(bdev);
}
@@ -1277,6 +1279,7 @@ int blkcg_init_disk(struct gendisk *disk)
int ret;
INIT_LIST_HEAD(&q->blkg_list);
+ mutex_init(&q->blkcg_pols_lock);
new_blkg = blkg_alloc(&blkcg_root, disk, GFP_KERNEL);
if (!new_blkg)
diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index efffc6fa55db..86bccdfa1a43 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -290,6 +290,10 @@ void rq_qos_exit(struct request_queue *q)
{
struct rq_qos *rqos;
+#ifdef CONFIG_BLK_CGROUP
+ mutex_lock(&q->blkcg_pols_lock);
+#endif
+
spin_lock_irq(&q->queue_lock);
rqos = q->rq_qos;
q->rq_qos = NULL;
@@ -300,4 +304,8 @@ void rq_qos_exit(struct request_queue *q)
rqos->ops->exit(rqos);
rqos = rqos->next;
} while (rqos);
+
+#ifdef CONFIG_BLK_CGROUP
+ mutex_unlock(&q->blkcg_pols_lock);
+#endif
}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 301cf1cf4f2f..824d68a41a83 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -484,6 +484,7 @@ struct request_queue {
DECLARE_BITMAP (blkcg_pols, BLKCG_MAX_POLS);
struct blkcg_gq *root_blkg;
struct list_head blkg_list;
+ struct mutex blkcg_pols_lock;
#endif
struct queue_limits limits;
--
2.31.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy
2022-12-17 3:09 [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy Yu Kuai
` (3 preceding siblings ...)
2022-12-17 3:09 ` [PATCH -next 4/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy Yu Kuai
@ 2022-12-19 20:55 ` Tejun Heo
2022-12-20 9:19 ` Yu Kuai
4 siblings, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2022-12-19 20:55 UTC (permalink / raw
To: Yu Kuai
Cc: hch, josef, axboe, cgroups, linux-block, linux-kernel, yukuai3,
yi.zhang
Hello,
On Sat, Dec 17, 2022 at 11:09:04AM +0800, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
>
> iocost is initialized when it's configured the first time, and iocost
> initializing can race with del_gendisk(), which will cause null pointer
> dereference:
>
> t1 t2
> ioc_qos_write
> blk_iocost_init
> rq_qos_add
> del_gendisk
> rq_qos_exit
> //iocost is removed from q->roqs
> blkcg_activate_policy
> pd_init_fn
> ioc_pd_init
> ioc = q_to_ioc(blkg->q)
> //can't find iocost and return null
>
> And iolatency is about to switch to the same lazy initialization.
>
> This patchset fix this problem by synchronize rq_qos_add() and
> blkcg_activate_policy() with rq_qos_exit().
So, the patchset seems a bit overly complicated to me. What do you think
about the following?
* These init/exit paths are super cold path, just protecting them with a
global mutex is probably enough. If we encounter a scalability problem,
it's easy to fix down the line.
* If we're synchronizing this with a mutex anyway, no need to grab the
queue_lock, right? rq_qos_add/del/exit() can all just hold the mutex.
* And we can keep the state tracking within rq_qos. When rq_qos_exit() is
called, mark it so that future adds will fail - be that a special ->next
value a queue flag or whatever.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy
2022-12-19 20:55 ` [PATCH -next 0/4] " Tejun Heo
@ 2022-12-20 9:19 ` Yu Kuai
2022-12-20 16:01 ` Tejun Heo
2022-12-21 10:37 ` Christoph Hellwig
0 siblings, 2 replies; 12+ messages in thread
From: Yu Kuai @ 2022-12-20 9:19 UTC (permalink / raw
To: Tejun Heo, Yu Kuai
Cc: hch, josef, axboe, cgroups, linux-block, linux-kernel, yi.zhang,
yukuai (C)
Hi,
在 2022/12/20 4:55, Tejun Heo 写道:
> Hello,
>
> On Sat, Dec 17, 2022 at 11:09:04AM +0800, Yu Kuai wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> iocost is initialized when it's configured the first time, and iocost
>> initializing can race with del_gendisk(), which will cause null pointer
>> dereference:
>>
>> t1 t2
>> ioc_qos_write
>> blk_iocost_init
>> rq_qos_add
>> del_gendisk
>> rq_qos_exit
>> //iocost is removed from q->roqs
>> blkcg_activate_policy
>> pd_init_fn
>> ioc_pd_init
>> ioc = q_to_ioc(blkg->q)
>> //can't find iocost and return null
>>
>> And iolatency is about to switch to the same lazy initialization.
>>
>> This patchset fix this problem by synchronize rq_qos_add() and
>> blkcg_activate_policy() with rq_qos_exit().
>
> So, the patchset seems a bit overly complicated to me. What do you think
> about the following?
>
> * These init/exit paths are super cold path, just protecting them with a
> global mutex is probably enough. If we encounter a scalability problem,
> it's easy to fix down the line.
>
> * If we're synchronizing this with a mutex anyway, no need to grab the
> queue_lock, right? rq_qos_add/del/exit() can all just hold the mutex.
>
> * And we can keep the state tracking within rq_qos. When rq_qos_exit() is
> called, mark it so that future adds will fail - be that a special ->next
> value a queue flag or whatever.
Yes, that sounds good. BTW, queue_lock is also used to protect
pd_alloc_fn/pd_init_fn,and we found that blkcg_activate_policy() is
problematic:
blkcg_activate_policy
spin_lock_irq(&q->queue_lock);
list_for_each_entry_reverse(blkg, &q->blkg_list
pd_alloc_fn(GFP_NOWAIT | __GFP_NOWARN,...) -> failed
spin_unlock_irq(&q->queue_lock);
// release queue_lock here is problematic, this will cause
pd_offline_fn called without pd_init_fn.
pd_alloc_fn(__GFP_NOWARN,...)
If we are using a mutex to protect rq_qos ops, it seems the right thing
to do do also using the mutex to protect blkcg_policy ops, and this
problem can be fixed because mutex can be held to alloc memroy with
GFP_KERNEL. What do you think?
Thanks,
Kuai
>
> Thanks.
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy
2022-12-20 9:19 ` Yu Kuai
@ 2022-12-20 16:01 ` Tejun Heo
2022-12-21 1:10 ` Yu Kuai
2022-12-21 10:37 ` Christoph Hellwig
1 sibling, 1 reply; 12+ messages in thread
From: Tejun Heo @ 2022-12-20 16:01 UTC (permalink / raw
To: Yu Kuai
Cc: hch, josef, axboe, cgroups, linux-block, linux-kernel, yi.zhang,
yukuai (C)
Hello,
On Tue, Dec 20, 2022 at 05:19:12PM +0800, Yu Kuai wrote:
> Yes, that sounds good. BTW, queue_lock is also used to protect
> pd_alloc_fn/pd_init_fn,and we found that blkcg_activate_policy() is
> problematic:
>
> blkcg_activate_policy
> spin_lock_irq(&q->queue_lock);
> list_for_each_entry_reverse(blkg, &q->blkg_list
> pd_alloc_fn(GFP_NOWAIT | __GFP_NOWARN,...) -> failed
>
> spin_unlock_irq(&q->queue_lock);
> // release queue_lock here is problematic, this will cause
> pd_offline_fn called without pd_init_fn.
> pd_alloc_fn(__GFP_NOWARN,...)
So, if a blkg is destroyed while a policy is being activated, right?
> If we are using a mutex to protect rq_qos ops, it seems the right thing
> to do do also using the mutex to protect blkcg_policy ops, and this
> problem can be fixed because mutex can be held to alloc memroy with
> GFP_KERNEL. What do you think?
One worry is that switching to mutex can be more headache due to destroy
path synchronization. Another approach would be using a per-blkg flag to
track whether a blkg has been initialized.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy
2022-12-20 16:01 ` Tejun Heo
@ 2022-12-21 1:10 ` Yu Kuai
2022-12-21 2:48 ` Yu Kuai
0 siblings, 1 reply; 12+ messages in thread
From: Yu Kuai @ 2022-12-21 1:10 UTC (permalink / raw
To: Tejun Heo, Yu Kuai
Cc: hch, josef, axboe, cgroups, linux-block, linux-kernel, yi.zhang,
yukuai (C)
Hi,
在 2022/12/21 0:01, Tejun Heo 写道:
> Hello,
>
> On Tue, Dec 20, 2022 at 05:19:12PM +0800, Yu Kuai wrote:
>> Yes, that sounds good. BTW, queue_lock is also used to protect
>> pd_alloc_fn/pd_init_fn,and we found that blkcg_activate_policy() is
>> problematic:
>>
>> blkcg_activate_policy
>> spin_lock_irq(&q->queue_lock);
>> list_for_each_entry_reverse(blkg, &q->blkg_list
>> pd_alloc_fn(GFP_NOWAIT | __GFP_NOWARN,...) -> failed
>>
>> spin_unlock_irq(&q->queue_lock);
>> // release queue_lock here is problematic, this will cause
>> pd_offline_fn called without pd_init_fn.
>> pd_alloc_fn(__GFP_NOWARN,...)
>
> So, if a blkg is destroyed while a policy is being activated, right?
Yes, remove cgroup can race with this, for bfq null pointer deference
will be triggered in bfq_pd_offline().
>
>> If we are using a mutex to protect rq_qos ops, it seems the right thing
>> to do do also using the mutex to protect blkcg_policy ops, and this
>> problem can be fixed because mutex can be held to alloc memroy with
>> GFP_KERNEL. What do you think?
>
> One worry is that switching to mutex can be more headache due to destroy
> path synchronization. Another approach would be using a per-blkg flag to
> track whether a blkg has been initialized.
I think perhaps you mean per blkg_policy_data flag? per blkg flag should
not work in this case.
Thanks,
Kuai
>
> Thanks.
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy
2022-12-21 1:10 ` Yu Kuai
@ 2022-12-21 2:48 ` Yu Kuai
0 siblings, 0 replies; 12+ messages in thread
From: Yu Kuai @ 2022-12-21 2:48 UTC (permalink / raw
To: Yu Kuai, Tejun Heo
Cc: hch, josef, axboe, cgroups, linux-block, linux-kernel, yi.zhang,
yukuai (C)
Hi,
在 2022/12/21 9:10, Yu Kuai 写道:
> Hi,
>
> 在 2022/12/21 0:01, Tejun Heo 写道:
>> Hello,
>>
>> On Tue, Dec 20, 2022 at 05:19:12PM +0800, Yu Kuai wrote:
>>> Yes, that sounds good. BTW, queue_lock is also used to protect
>>> pd_alloc_fn/pd_init_fn,and we found that blkcg_activate_policy() is
>>> problematic:
>>>
>>> blkcg_activate_policy
>>> spin_lock_irq(&q->queue_lock);
>>> list_for_each_entry_reverse(blkg, &q->blkg_list
>>> pd_alloc_fn(GFP_NOWAIT | __GFP_NOWARN,...) -> failed
>>>
>>> spin_unlock_irq(&q->queue_lock);
>>> // release queue_lock here is problematic, this will cause
>>> pd_offline_fn called without pd_init_fn.
>>> pd_alloc_fn(__GFP_NOWARN,...)
>>
>> So, if a blkg is destroyed while a policy is being activated, right?
> Yes, remove cgroup can race with this, for bfq null pointer deference
> will be triggered in bfq_pd_offline().
BTW, We just found that pd_online_fn() is missed in
blkcg_activate_policy()... Currently this is not a problem because only
bl-throttle implement it, and blk-throttle is activated while creating
blkg.
Thanks,
Kuai
>
>>
>>> If we are using a mutex to protect rq_qos ops, it seems the right thing
>>> to do do also using the mutex to protect blkcg_policy ops, and this
>>> problem can be fixed because mutex can be held to alloc memroy with
>>> GFP_KERNEL. What do you think?
>>
>> One worry is that switching to mutex can be more headache due to destroy
>> path synchronization. Another approach would be using a per-blkg flag to
>> track whether a blkg has been initialized.
> I think perhaps you mean per blkg_policy_data flag? per blkg flag should
> not work in this case.
>
> Thanks,
> Kuai
>>
>> Thanks.
>>
>
> .
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH -next 1/4] block/rq_qos: protect 'q->rq_qos' with queue_lock in rq_qos_exit()
2022-12-17 3:09 ` [PATCH -next 1/4] block/rq_qos: protect 'q->rq_qos' with queue_lock in rq_qos_exit() Yu Kuai
@ 2022-12-21 10:34 ` Christoph Hellwig
0 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2022-12-21 10:34 UTC (permalink / raw
To: Yu Kuai
Cc: tj, hch, josef, axboe, cgroups, linux-block, linux-kernel,
yukuai3, yi.zhang
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy
2022-12-20 9:19 ` Yu Kuai
2022-12-20 16:01 ` Tejun Heo
@ 2022-12-21 10:37 ` Christoph Hellwig
1 sibling, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2022-12-21 10:37 UTC (permalink / raw
To: Yu Kuai
Cc: Tejun Heo, hch, josef, axboe, cgroups, linux-block, linux-kernel,
yi.zhang, yukuai (C)
On Tue, Dec 20, 2022 at 05:19:12PM +0800, Yu Kuai wrote:
> If we are using a mutex to protect rq_qos ops, it seems the right thing
> to do do also using the mutex to protect blkcg_policy ops, and this
> problem can be fixed because mutex can be held to alloc memroy with
> GFP_KERNEL. What do you think?
Getting rid of the atomic allocations would be awesome.
FYI, I'm also in favor of everything that moves things out of
queue_lock into more dedicated locks. queue_lock is such an undocument
mess of untargeted things that don't realted to each other right now
that splitting and removing it is becoming more and more important.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-12-21 10:38 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-17 3:09 [PATCH -next 0/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy Yu Kuai
2022-12-17 3:09 ` [PATCH -next 1/4] block/rq_qos: protect 'q->rq_qos' with queue_lock in rq_qos_exit() Yu Kuai
2022-12-21 10:34 ` Christoph Hellwig
2022-12-17 3:09 ` [PATCH -next 2/4] block/rq_qos: fail rq_qos_add() after del_gendisk() Yu Kuai
2022-12-17 3:09 ` [PATCH -next 3/4] blk-cgroup: add a new interface blkcg_conf_close_bdev() Yu Kuai
2022-12-17 3:09 ` [PATCH -next 4/4] blk-cgroup: synchronize del_gendisk() with configuring cgroup policy Yu Kuai
2022-12-19 20:55 ` [PATCH -next 0/4] " Tejun Heo
2022-12-20 9:19 ` Yu Kuai
2022-12-20 16:01 ` Tejun Heo
2022-12-21 1:10 ` Yu Kuai
2022-12-21 2:48 ` Yu Kuai
2022-12-21 10:37 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).