Linux-Block Archive mirror
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Damien Le Moal <dlemoal@kernel.org>, linux-block@vger.kernel.org
Subject: Re: [PATCH v2 1/2] block: prevent freeing a zone write plug too early
Date: Wed, 24 Apr 2024 07:58:22 -0600	[thread overview]
Message-ID: <2787b6ed-c5a2-4f5a-9214-3fab75bfe1ee@kernel.dk> (raw)
In-Reply-To: <61ff5a29-9b16-4a52-b0ae-cfba2ea3c748@kernel.org>

On 4/23/24 12:19 PM, Damien Le Moal wrote:
> On 2024/04/24 1:36, Jens Axboe wrote:
>> On 4/23/24 9:16 AM, Damien Le Moal wrote:
>>> On 2024/04/24 0:21, Jens Axboe wrote:
>>>> On 4/20/24 1:58 AM, Damien Le Moal wrote:
>>>>> The submission of plugged BIOs is done using a work struct executing the
>>>>> function blk_zone_wplug_bio_work(). This function gets and submits a
>>>>> plugged zone write BIO and is guaranteed to operate on a valid zone
>>>>> write plug (with a reference count higher than 0) on entry as plugged
>>>>> BIOs hold a reference on their zone write plugs. However, once a BIO is
>>>>> submitted with submit_bio_noacct_nocheck(), the BIO may complete before
>>>>> blk_zone_wplug_bio_work(), with the BIO completion trigering a release
>>>>> and freeing of the zone write plug if the BIO is the last write to a
>>>>> zone (making the zone FULL). This potentially can result in the zone
>>>>> write plug being freed while the work is still active.
>>>>>
>>>>> Avoid this by calling flush_work() from disk_free_zone_wplug_rcu().
>>>>>
>>>>> Fixes: dd291d77cc90 ("block: Introduce zone write plugging")
>>>>> Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
>>>>> ---
>>>>>  block/blk-zoned.c | 2 ++
>>>>>  1 file changed, 2 insertions(+)
>>>>>
>>>>> diff --git a/block/blk-zoned.c b/block/blk-zoned.c
>>>>> index 3befebe6b319..685f0b9159fd 100644
>>>>> --- a/block/blk-zoned.c
>>>>> +++ b/block/blk-zoned.c
>>>>> @@ -526,6 +526,8 @@ static void disk_free_zone_wplug_rcu(struct rcu_head *rcu_head)
>>>>>  	struct blk_zone_wplug *zwplug =
>>>>>  		container_of(rcu_head, struct blk_zone_wplug, rcu_head);
>>>>>  
>>>>> +	flush_work(&zwplug->bio_work);
>>>>> +
>>>>>  	mempool_free(zwplug, zwplug->disk->zone_wplugs_pool);
>>>>>  }
>>>>
>>>> This is totally backwards. First of all, if you actually had work that
>>>> needed flushing at this point, the kernel would bomb spectacularly.
>>>> Secondly, what's the point of using RCU to protect this, if you're now
>>>> needing to flush work from the RCU callback? That's a clear sign that
>>>> something is very wrong here with your references / RCU usage.. The work
>>>> item should hold a reference to it, trying to paper around it like this
>>>> is not going to work at all.
>>>>
>>>> Why is the work item racing with RCU freeing?!
>>>
>>> The work item is a field of the zone write plug. Zone write plugs have
>>> references to them as long as BIOs are in flight and and the zone is
>>> not full. The zone write plug freeing through rcu is triggered by the
>>> last write to a zone that makes the zone full. But the completion of
>>> this last write BIO may happen right after the work issued the BIO
>>> with submit_bio_noacct_nocheck() and before blk_zone_wplug_bio_work()
>>> returns, while the work item is still active.
>>>
>>> The actual freeing of the plug happens only after the rcu grace
>>> period, and I was not entirely sure if this is enough to guarantee
>>> that the work thread is finished. But checking how the workqueue code
>>> processes the work item by calling the work function
>>> (blk_zone_wplug_bio_work() in this case), there is no issue because
>>> the work item (struct work_struct) is not touched once the work
>>> function is called. So there are no issues/races with freeing the zone
>>> write plug. I was overthinking this. My bad. We can drop this patch.
>>> Apologies for the noise.
>>
>> I took a closer look at the zone write plug reference handling, and it
>> still doesn't look very good. Why are some just atomic_dec and there's
>> just one spot that does dec_and_test? This again looks like janky
>> referencing, to be honest.
>>
>> The relationship seems like it should be pretty clear. Any bio inflight
>> against this zone plug should have a reference to it, AND the owner
>> should have a reference to it, otherwise any bio completion (which can
>> happen at ANY time) could free it. Any dropping of the ref should use a
>> helper that does atomic_dec_and_test(), eg what disk_put_zone_wplug()
>> does.
>>
>> There should be no doubt about the above at all. If the plug has been
>> added to a workqueue, it should be quite obvious that of course it has a
>> reference to it already, outside of the bio's that are in it.
>>
>> I'd strongly encourage getting this sorted out before the merge window,
>> I'm not at all convinced it's correct as-is. It's certainly not
>> obviously correct, which it should be. The RCU rules are pretty simple
>> if the the references are done in the kernel idiomatic way, but they are
>> not.
> 
> OK. I am traveling this week so I will not be able to send a
> well-tested cleanup patch but I will do so first thing next week.

Sounds good, thanks.

-- 
Jens Axboe


  reply	other threads:[~2024-04-24 13:58 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-20  7:58 [PATCH v2 0/2] Fixes for zone write plugging Damien Le Moal
2024-04-20  7:58 ` [PATCH v2 1/2] block: prevent freeing a zone write plug too early Damien Le Moal
2024-04-22  6:23   ` Christoph Hellwig
2024-04-23  6:12     ` Damien Le Moal
2024-04-23 14:21   ` Jens Axboe
2024-04-23 15:16     ` Damien Le Moal
2024-04-23 15:36       ` Jens Axboe
2024-04-23 18:19         ` Damien Le Moal
2024-04-24 13:58           ` Jens Axboe [this message]
2024-04-20  7:58 ` [PATCH v2 2/2] block: use a per disk workqueue for zone write plugging Damien Le Moal
2024-04-22  6:25   ` Christoph Hellwig
2024-04-23  6:01     ` Damien Le Moal
2024-04-23 15:46 ` (subset) [PATCH v2 0/2] Fixes " Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2787b6ed-c5a2-4f5a-9214-3fab75bfe1ee@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=dlemoal@kernel.org \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).