All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Can Guo <cang@codeaurora.org>
To: Bart Van Assche <bvanassche@acm.org>
Cc: asutoshd@codeaurora.org, nguyenb@codeaurora.org,
	hongwus@codeaurora.org, ziqichen@codeaurora.org,
	linux-scsi@vger.kernel.org, kernel-team@android.com,
	Alim Akhtar <alim.akhtar@samsung.com>,
	Avri Altman <avri.altman@wdc.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Stanley Chu <stanley.chu@mediatek.com>,
	Bean Huo <beanhuo@micron.com>, Jaegeuk Kim <jaegeuk@kernel.org>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 8/9] scsi: ufs: Update the fast abort path in ufshcd_abort() for PM requests
Date: Sun, 13 Jun 2021 22:42:55 +0800	[thread overview]
Message-ID: <75527f0ba5d315d6edbf800a2ddcf8c7@codeaurora.org> (raw)
In-Reply-To: <926d8c4a-0fbf-a973-188a-b10c9acaa444@acm.org>

Hi Bart,

On 2021-06-13 00:50, Bart Van Assche wrote:
> On 6/12/21 12:07 AM, Can Guo wrote:
>> Sigh... I also want my life and work to be easier...
> 
> How about reducing the number of states and state transitions in the 
> UFS
> driver? One source of complexity is that ufshcd_err_handler() is 
> scheduled
> independently of the SCSI error handler and hence may run concurrently
> with the SCSI error handler. Has the following already been considered?
> - Call ufshcd_err_handler() synchronously from ufshcd_abort() and
> ufshcd_eh_host_reset_handler() instead of asynchronously.

1. ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and 
flushes
it, so it is synchronous. ufshcd_eh_host_reset_handler() used to call
reset_and_restore() directly, which can run concurrently with UFS error 
handler,
so I fixed it last year [1].

2. ufshcd_abort() invokes ufshcd_err_handler() synchronously can have a
live lock issue, which is why I chose the asynchronous way (from the 
first
day I started to fix error handling). The live lock happens when abort 
happens
to a PM request, e.g., a SSU cmd sent from suspend/resume. Because UFS 
error
handler is synchronized with suspend/resume (by calling 
pm_runtime_get_sync()
and lock_system_sleep()), the sequence is like:
[1] ufshcd_wl_resume() sends SSU cmd
[2] ufshcd_abort() calls UFS error handler
[3] UFS error handler calls lock_system_sleep() and 
pm_runtime_get_sync()

In above sequence, either lock_system_sleep() or pm_runtime_get_sync() 
shall
be blocked - [3] is blocked by [1], [2] is blocked by [3], while [1] is 
blocked by [2].

For PM requests, I chose to abort them fast to unblock suspend/resume,
suspend/resume shall fail of course, but UFS error handler recovers
PM errors anyways.

> - Call scsi_schedule_eh() from ufshcd_uic_pwr_ctrl() and
> ufshcd_check_errors() instead of ufshcd_schedule_eh_work().

When ufshcd_uic_pwr_ctrl() and/or ufshcd_check_errors() report errors,
usually they are fatal errors, according to UFSHCI spec, SW should 
re-probe
UFS to recover.

However scsi_schedule_eh() does more than that - scsi_unjam_host() sends
request sense cmd and calls scsi_eh_ready_devs(), while 
scsi_eh_ready_devs()
sends test unit ready cmd and calls all the way down to 
scsi_eh_device/target/
bus/host_reset(). But we only need scsi_eh_host_reset() in this case. I 
know
you have concerns that scsi_schedule_eh() may run concurrently with UFS 
error
handler, but as I mentioned above in [1] - I've made 
ufshcd_eh_host_reset_handler()
synchronized with UFS error handler, hope that can ease your concern.

I am not saying your idea won't work, it is a good suggestion. I will 
try
it after these changes go in, because it would require extra effort and 
the
effort won't be minor - I need to consider how to remove/reduce the 
ufshcd
states along with the change and the error injection and stability test 
all
over again, which is a long way to go. As for now, at least current 
changes
works well as per my test and we really need these changes for 
Andriod12-5.10.

Thanks,

Can Guo.

> 
> These changes will guarantee that all commands have completed or timed
> out before ufshcd_err_handler() is called. I think that would allow to
> remove e.g. the following code from the error handler:
> 
> 	ufshcd_scsi_block_requests(hba);
> 	/* Drain ufshcd_queuecommand() */
> 	down_write(&hba->clk_scaling_lock);
> 	up_write(&hba->clk_scaling_lock);
> 
> Thanks,
> 
> Bart.

  reply	other threads:[~2021-06-13 14:43 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10  4:43 [PATCH v3 0/9] Complementary changes for error handling Can Guo
2021-06-10  4:43 ` [PATCH v3 1/9] scsi: ufs: Differentiate status between hba pm ops and wl pm ops Can Guo
2021-06-10 11:15   ` Adrian Hunter
2021-06-11  0:53     ` Can Guo
2021-06-11 20:40   ` Bart Van Assche
2021-06-12  6:20     ` Can Guo
2021-06-16 17:50   ` Bart Van Assche
2021-06-23  1:32     ` Can Guo
2021-06-10  4:43 ` [PATCH v3 2/9] scsi: ufs: Update the return value of supplier " Can Guo
2021-06-10  4:43 ` [PATCH v3 3/9] scsi: ufs: Enable IRQ after enabling clocks in error handling preparation Can Guo
2021-06-10  4:43 ` [PATCH v3 4/9] scsi: ufs: Complete the cmd before returning in queuecommand Can Guo
2021-06-11 20:52   ` Bart Van Assche
2021-06-12  7:38     ` Can Guo
2021-06-12 15:50       ` Bart Van Assche
2021-06-13 13:30         ` Can Guo
2021-06-10  4:43 ` [PATCH v3 5/9] scsi: ufs: Simplify error handling preparation Can Guo
2021-06-10 12:30   ` Adrian Hunter
2021-06-11  3:01     ` Can Guo
2021-06-11 20:58       ` Bart Van Assche
2021-06-12  6:46         ` Can Guo
2021-06-12  9:49           ` Can Guo
2021-06-10  4:43 ` [PATCH v3 6/9] scsi: ufs: Update ufshcd_recover_pm_error() Can Guo
2021-06-10  4:43 ` [PATCH v3 7/9] scsi: ufs: Let host_sem cover the entire system suspend/resume Can Guo
2021-06-10 13:32   ` Adrian Hunter
2021-06-11  3:06     ` Can Guo
2021-06-11 21:00   ` Bart Van Assche
2021-06-12  6:46     ` Can Guo
2021-06-10  4:43 ` [PATCH v3 8/9] scsi: ufs: Update the fast abort path in ufshcd_abort() for PM requests Can Guo
2021-06-11 21:02   ` Bart Van Assche
2021-06-12  7:07     ` Can Guo
2021-06-12 16:50       ` Bart Van Assche
2021-06-13 14:42         ` Can Guo [this message]
2021-06-14 18:49           ` Bart Van Assche
2021-06-15  2:36             ` Can Guo
2021-06-15  3:17               ` Can Guo
2021-06-15 18:25               ` Bart Van Assche
2021-06-16  4:00                 ` Can Guo
2021-06-16  4:40                   ` Bart Van Assche
2021-06-16  8:47                     ` Can Guo
2021-06-16 17:55                       ` Bart Van Assche
2021-06-23  1:34                         ` Can Guo
2021-06-10  4:43 ` [PATCH v3 9/9] scsi: ufs: Apply more limitations to user access Can Guo
2021-06-11 21:03   ` Bart Van Assche
2021-06-12  7:13     ` Can Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=75527f0ba5d315d6edbf800a2ddcf8c7@codeaurora.org \
    --to=cang@codeaurora.org \
    --cc=alim.akhtar@samsung.com \
    --cc=asutoshd@codeaurora.org \
    --cc=avri.altman@wdc.com \
    --cc=beanhuo@micron.com \
    --cc=bvanassche@acm.org \
    --cc=hongwus@codeaurora.org \
    --cc=jaegeuk@kernel.org \
    --cc=jejb@linux.ibm.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=nguyenb@codeaurora.org \
    --cc=stanley.chu@mediatek.com \
    --cc=ziqichen@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.