All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Can Guo <cang@codeaurora.org>
To: Bart Van Assche <bvanassche@acm.org>
Cc: asutoshd@codeaurora.org, nguyenb@codeaurora.org,
	hongwus@codeaurora.org, ziqichen@codeaurora.org,
	linux-scsi@vger.kernel.org, kernel-team@android.com,
	Alim Akhtar <alim.akhtar@samsung.com>,
	Avri Altman <avri.altman@wdc.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Stanley Chu <stanley.chu@mediatek.com>,
	Bean Huo <beanhuo@micron.com>, Jaegeuk Kim <jaegeuk@kernel.org>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 8/9] scsi: ufs: Update the fast abort path in ufshcd_abort() for PM requests
Date: Tue, 15 Jun 2021 11:17:40 +0800	[thread overview]
Message-ID: <ef1b4408a5fd87b3b2c9cb0e891b892f@codeaurora.org> (raw)
In-Reply-To: <3fce15502c2742a4388817538eb4db97@codeaurora.org>

On 2021-06-15 10:36, Can Guo wrote:
> Hi Bart,
> 
> On 2021-06-15 02:49, Bart Van Assche wrote:
>> On 6/13/21 7:42 AM, Can Guo wrote:
>>> 2. ufshcd_abort() invokes ufshcd_err_handler() synchronously can have 
>>> a
>>> live lock issue, which is why I chose the asynchronous way (from the 
>>> first
>>> day I started to fix error handling). The live lock happens when 
>>> abort
>>> happens
>>> to a PM request, e.g., a SSU cmd sent from suspend/resume. Because 
>>> UFS
>>> error
>>> handler is synchronized with suspend/resume (by calling
>>> pm_runtime_get_sync()
>>> and lock_system_sleep()), the sequence is like:
>>> [1] ufshcd_wl_resume() sends SSU cmd
>>> [2] ufshcd_abort() calls UFS error handler
>>> [3] UFS error handler calls lock_system_sleep() and 
>>> pm_runtime_get_sync()
>>> 
>>> In above sequence, either lock_system_sleep() or 
>>> pm_runtime_get_sync()
>>> shall
>>> be blocked - [3] is blocked by [1], [2] is blocked by [3], while [1] 
>>> is
>>> blocked by [2].
>>> 
>>> For PM requests, I chose to abort them fast to unblock 
>>> suspend/resume,
>>> suspend/resume shall fail of course, but UFS error handler recovers
>>> PM errors anyways.
>> 
>> In the above sequence, does [2] perhaps refer to aborting the SSU
>> command submitted in step [1] (this is not clear to me)?
> 
> Yes, your understanding is right.
> 
>> If so, how about breaking the circular waiting cycle as follows:
>> - If it can happen that SSU succeeds after more than scsi_timeout
>>   seconds, define a custom timeout handler. From inside the timeout
>>   handler, schedule a link check and return BLK_EH_RESET_TIMER. If the
>>   link is no longer operational, run the error handler. If the link
>>   cannot be recovered by the error handler, fail all pending commands.
>>   This will prevent that ufshcd_abort() is called if a SSU command 
>> takes
>>   longer than expected. See also commit 0dd0dec1677e.
>> - Modify the UFS error handler such that it accepts a context 
>> argument.
>>   The context argument specifies whether or not the UFS error handler 
>> is
>>   called from inside a system suspend or system resume handler. If the
>>   UFS error handler is called from inside a system suspend or resume
>>   callback, skip the lock_system_sleep() and unlock_system_sleep()
>>   calls.
>> 
> 
> I am aware of commit 0dd0dec1677e, I gave my reviewed-by tag. Thank you
> for your suggestion and I believe it can resolve the cycle, because 
> actually
> I've considered the similar way (leverage hba->host->eh_noresume) last 
> year,
> but I didn't take this way due to below reasons:
> 
> 1. UFS error handler basically does one thing - reset and restore, 
> which
> stops hba [1], resets device [2] and re-probes the device [3]. Stopping 
> hba [1]
> shall complete any pending requests in the doorbell (with error or no 
> error).
> After [1], suspend/resume contexts, blocked by SSU cmd, shall be 
> unblocked
> right away to do whatever it needs to handle the SSU cmd failure 
> (completed
> in [1], so scsi_execute() returns an error), e.g., put link back to the 
> old
> state. call ufshcd_vops_suspend(), turn off irq/clocks/powers and 
> etc...
> However, reset and restore ([2] and [3]) is still running, and it can
> (most likely)
> be disturbed by suspend/resume. So passing a parameter or using
> hba->host->eh_noresume
> to skip lock_system_sleep() and unlock_system_sleep() can break the 
> cycle,
> but error handling may run concurrently with suspend/resume. Of course 
> we can
> modify suspend/resume to avoid it, but I was pursuing a minimal change
> to get this fixed.
> 

Add more - besides, SSU cmd is not the only PM request sent during 
suspend/resume,
last year (before your changes came in) it also sends request sense cmd 
without
checking the return value of it - so if request sense cmd abort happens, 
suspend/resume
still move forward, which can run concurrently with error handling. So I 
was pursuing
a way to make error handler less dependent on the bahaviours of these 
contexts.

Thanks,

Can Guo.

> 2. Whatever way we take to break the cycle, suspend/resume shall fail 
> and
> RPM framework shall save the error to dev.power.runtime_error, leaving
> the device in runtime suspended or active mode permanently. If it is 
> left
> runtime suspended, UFS driver won't accept cmd anymore, while if it is 
> left
> runtime active, powers of UFS device and host will be left ON, leading 
> to power
> penalty. So my main idea is to let suspend/resume contexts, blocked by 
> PM cmds,
> fail fast first and then error handler recover everything back to work.
> 
> Thanks,
> 
> Can Guo.
> 
>> Thanks,
>> 
>> Bart.

  reply	other threads:[~2021-06-15  3:17 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10  4:43 [PATCH v3 0/9] Complementary changes for error handling Can Guo
2021-06-10  4:43 ` [PATCH v3 1/9] scsi: ufs: Differentiate status between hba pm ops and wl pm ops Can Guo
2021-06-10 11:15   ` Adrian Hunter
2021-06-11  0:53     ` Can Guo
2021-06-11 20:40   ` Bart Van Assche
2021-06-12  6:20     ` Can Guo
2021-06-16 17:50   ` Bart Van Assche
2021-06-23  1:32     ` Can Guo
2021-06-10  4:43 ` [PATCH v3 2/9] scsi: ufs: Update the return value of supplier " Can Guo
2021-06-10  4:43 ` [PATCH v3 3/9] scsi: ufs: Enable IRQ after enabling clocks in error handling preparation Can Guo
2021-06-10  4:43 ` [PATCH v3 4/9] scsi: ufs: Complete the cmd before returning in queuecommand Can Guo
2021-06-11 20:52   ` Bart Van Assche
2021-06-12  7:38     ` Can Guo
2021-06-12 15:50       ` Bart Van Assche
2021-06-13 13:30         ` Can Guo
2021-06-10  4:43 ` [PATCH v3 5/9] scsi: ufs: Simplify error handling preparation Can Guo
2021-06-10 12:30   ` Adrian Hunter
2021-06-11  3:01     ` Can Guo
2021-06-11 20:58       ` Bart Van Assche
2021-06-12  6:46         ` Can Guo
2021-06-12  9:49           ` Can Guo
2021-06-10  4:43 ` [PATCH v3 6/9] scsi: ufs: Update ufshcd_recover_pm_error() Can Guo
2021-06-10  4:43 ` [PATCH v3 7/9] scsi: ufs: Let host_sem cover the entire system suspend/resume Can Guo
2021-06-10 13:32   ` Adrian Hunter
2021-06-11  3:06     ` Can Guo
2021-06-11 21:00   ` Bart Van Assche
2021-06-12  6:46     ` Can Guo
2021-06-10  4:43 ` [PATCH v3 8/9] scsi: ufs: Update the fast abort path in ufshcd_abort() for PM requests Can Guo
2021-06-11 21:02   ` Bart Van Assche
2021-06-12  7:07     ` Can Guo
2021-06-12 16:50       ` Bart Van Assche
2021-06-13 14:42         ` Can Guo
2021-06-14 18:49           ` Bart Van Assche
2021-06-15  2:36             ` Can Guo
2021-06-15  3:17               ` Can Guo [this message]
2021-06-15 18:25               ` Bart Van Assche
2021-06-16  4:00                 ` Can Guo
2021-06-16  4:40                   ` Bart Van Assche
2021-06-16  8:47                     ` Can Guo
2021-06-16 17:55                       ` Bart Van Assche
2021-06-23  1:34                         ` Can Guo
2021-06-10  4:43 ` [PATCH v3 9/9] scsi: ufs: Apply more limitations to user access Can Guo
2021-06-11 21:03   ` Bart Van Assche
2021-06-12  7:13     ` Can Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ef1b4408a5fd87b3b2c9cb0e891b892f@codeaurora.org \
    --to=cang@codeaurora.org \
    --cc=alim.akhtar@samsung.com \
    --cc=asutoshd@codeaurora.org \
    --cc=avri.altman@wdc.com \
    --cc=beanhuo@micron.com \
    --cc=bvanassche@acm.org \
    --cc=hongwus@codeaurora.org \
    --cc=jaegeuk@kernel.org \
    --cc=jejb@linux.ibm.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=nguyenb@codeaurora.org \
    --cc=stanley.chu@mediatek.com \
    --cc=ziqichen@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.