From: Can Guo <cang@codeaurora.org>
To: Bart Van Assche <bvanassche@acm.org>
Cc: asutoshd@codeaurora.org, nguyenb@codeaurora.org,
hongwus@codeaurora.org, ziqichen@codeaurora.org,
linux-scsi@vger.kernel.org, kernel-team@android.com,
Alim Akhtar <alim.akhtar@samsung.com>,
Avri Altman <avri.altman@wdc.com>,
"James E.J. Bottomley" <jejb@linux.ibm.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>,
Stanley Chu <stanley.chu@mediatek.com>,
Bean Huo <beanhuo@micron.com>, Jaegeuk Kim <jaegeuk@kernel.org>,
open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 8/9] scsi: ufs: Update the fast abort path in ufshcd_abort() for PM requests
Date: Sun, 13 Jun 2021 22:42:55 +0800 [thread overview]
Message-ID: <75527f0ba5d315d6edbf800a2ddcf8c7@codeaurora.org> (raw)
In-Reply-To: <926d8c4a-0fbf-a973-188a-b10c9acaa444@acm.org>
Hi Bart,
On 2021-06-13 00:50, Bart Van Assche wrote:
> On 6/12/21 12:07 AM, Can Guo wrote:
>> Sigh... I also want my life and work to be easier...
>
> How about reducing the number of states and state transitions in the
> UFS
> driver? One source of complexity is that ufshcd_err_handler() is
> scheduled
> independently of the SCSI error handler and hence may run concurrently
> with the SCSI error handler. Has the following already been considered?
> - Call ufshcd_err_handler() synchronously from ufshcd_abort() and
> ufshcd_eh_host_reset_handler() instead of asynchronously.
1. ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and
flushes
it, so it is synchronous. ufshcd_eh_host_reset_handler() used to call
reset_and_restore() directly, which can run concurrently with UFS error
handler,
so I fixed it last year [1].
2. ufshcd_abort() invokes ufshcd_err_handler() synchronously can have a
live lock issue, which is why I chose the asynchronous way (from the
first
day I started to fix error handling). The live lock happens when abort
happens
to a PM request, e.g., a SSU cmd sent from suspend/resume. Because UFS
error
handler is synchronized with suspend/resume (by calling
pm_runtime_get_sync()
and lock_system_sleep()), the sequence is like:
[1] ufshcd_wl_resume() sends SSU cmd
[2] ufshcd_abort() calls UFS error handler
[3] UFS error handler calls lock_system_sleep() and
pm_runtime_get_sync()
In above sequence, either lock_system_sleep() or pm_runtime_get_sync()
shall
be blocked - [3] is blocked by [1], [2] is blocked by [3], while [1] is
blocked by [2].
For PM requests, I chose to abort them fast to unblock suspend/resume,
suspend/resume shall fail of course, but UFS error handler recovers
PM errors anyways.
> - Call scsi_schedule_eh() from ufshcd_uic_pwr_ctrl() and
> ufshcd_check_errors() instead of ufshcd_schedule_eh_work().
When ufshcd_uic_pwr_ctrl() and/or ufshcd_check_errors() report errors,
usually they are fatal errors, according to UFSHCI spec, SW should
re-probe
UFS to recover.
However scsi_schedule_eh() does more than that - scsi_unjam_host() sends
request sense cmd and calls scsi_eh_ready_devs(), while
scsi_eh_ready_devs()
sends test unit ready cmd and calls all the way down to
scsi_eh_device/target/
bus/host_reset(). But we only need scsi_eh_host_reset() in this case. I
know
you have concerns that scsi_schedule_eh() may run concurrently with UFS
error
handler, but as I mentioned above in [1] - I've made
ufshcd_eh_host_reset_handler()
synchronized with UFS error handler, hope that can ease your concern.
I am not saying your idea won't work, it is a good suggestion. I will
try
it after these changes go in, because it would require extra effort and
the
effort won't be minor - I need to consider how to remove/reduce the
ufshcd
states along with the change and the error injection and stability test
all
over again, which is a long way to go. As for now, at least current
changes
works well as per my test and we really need these changes for
Andriod12-5.10.
Thanks,
Can Guo.
>
> These changes will guarantee that all commands have completed or timed
> out before ufshcd_err_handler() is called. I think that would allow to
> remove e.g. the following code from the error handler:
>
> ufshcd_scsi_block_requests(hba);
> /* Drain ufshcd_queuecommand() */
> down_write(&hba->clk_scaling_lock);
> up_write(&hba->clk_scaling_lock);
>
> Thanks,
>
> Bart.
next prev parent reply other threads:[~2021-06-13 14:43 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-10 4:43 [PATCH v3 0/9] Complementary changes for error handling Can Guo
2021-06-10 4:43 ` [PATCH v3 1/9] scsi: ufs: Differentiate status between hba pm ops and wl pm ops Can Guo
2021-06-10 11:15 ` Adrian Hunter
2021-06-11 0:53 ` Can Guo
2021-06-11 20:40 ` Bart Van Assche
2021-06-12 6:20 ` Can Guo
2021-06-16 17:50 ` Bart Van Assche
2021-06-23 1:32 ` Can Guo
2021-06-10 4:43 ` [PATCH v3 2/9] scsi: ufs: Update the return value of supplier " Can Guo
2021-06-10 4:43 ` [PATCH v3 3/9] scsi: ufs: Enable IRQ after enabling clocks in error handling preparation Can Guo
2021-06-10 4:43 ` [PATCH v3 4/9] scsi: ufs: Complete the cmd before returning in queuecommand Can Guo
2021-06-11 20:52 ` Bart Van Assche
2021-06-12 7:38 ` Can Guo
2021-06-12 15:50 ` Bart Van Assche
2021-06-13 13:30 ` Can Guo
2021-06-10 4:43 ` [PATCH v3 5/9] scsi: ufs: Simplify error handling preparation Can Guo
2021-06-10 12:30 ` Adrian Hunter
2021-06-11 3:01 ` Can Guo
2021-06-11 20:58 ` Bart Van Assche
2021-06-12 6:46 ` Can Guo
2021-06-12 9:49 ` Can Guo
2021-06-10 4:43 ` [PATCH v3 6/9] scsi: ufs: Update ufshcd_recover_pm_error() Can Guo
2021-06-10 4:43 ` [PATCH v3 7/9] scsi: ufs: Let host_sem cover the entire system suspend/resume Can Guo
2021-06-10 13:32 ` Adrian Hunter
2021-06-11 3:06 ` Can Guo
2021-06-11 21:00 ` Bart Van Assche
2021-06-12 6:46 ` Can Guo
2021-06-10 4:43 ` [PATCH v3 8/9] scsi: ufs: Update the fast abort path in ufshcd_abort() for PM requests Can Guo
2021-06-11 21:02 ` Bart Van Assche
2021-06-12 7:07 ` Can Guo
2021-06-12 16:50 ` Bart Van Assche
2021-06-13 14:42 ` Can Guo [this message]
2021-06-14 18:49 ` Bart Van Assche
2021-06-15 2:36 ` Can Guo
2021-06-15 3:17 ` Can Guo
2021-06-15 18:25 ` Bart Van Assche
2021-06-16 4:00 ` Can Guo
2021-06-16 4:40 ` Bart Van Assche
2021-06-16 8:47 ` Can Guo
2021-06-16 17:55 ` Bart Van Assche
2021-06-23 1:34 ` Can Guo
2021-06-10 4:43 ` [PATCH v3 9/9] scsi: ufs: Apply more limitations to user access Can Guo
2021-06-11 21:03 ` Bart Van Assche
2021-06-12 7:13 ` Can Guo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=75527f0ba5d315d6edbf800a2ddcf8c7@codeaurora.org \
--to=cang@codeaurora.org \
--cc=alim.akhtar@samsung.com \
--cc=asutoshd@codeaurora.org \
--cc=avri.altman@wdc.com \
--cc=beanhuo@micron.com \
--cc=bvanassche@acm.org \
--cc=hongwus@codeaurora.org \
--cc=jaegeuk@kernel.org \
--cc=jejb@linux.ibm.com \
--cc=kernel-team@android.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=nguyenb@codeaurora.org \
--cc=stanley.chu@mediatek.com \
--cc=ziqichen@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.