All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Can Guo <cang@codeaurora.org>
To: Bart Van Assche <bvanassche@acm.org>
Cc: asutoshd@codeaurora.org, nguyenb@codeaurora.org,
	hongwus@codeaurora.org, ziqichen@codeaurora.org,
	linux-scsi@vger.kernel.org, kernel-team@android.com,
	Alim Akhtar <alim.akhtar@samsung.com>,
	Avri Altman <avri.altman@wdc.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Stanley Chu <stanley.chu@mediatek.com>,
	Bean Huo <beanhuo@micron.com>, Jaegeuk Kim <jaegeuk@kernel.org>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 8/9] scsi: ufs: Update the fast abort path in ufshcd_abort() for PM requests
Date: Wed, 16 Jun 2021 16:47:31 +0800	[thread overview]
Message-ID: <8eadb2f2e30804faf23c9c71e5724d08@codeaurora.org> (raw)
In-Reply-To: <0081ad7c-8a15-62bb-0e6a-82552aab5309@acm.org>

Hi Bart,

On 2021-06-16 12:40, Bart Van Assche wrote:
> On 6/15/21 9:00 PM, Can Guo wrote:
>> I would like to stick to my way as of now because
>> 
>> 1. Merely preventing task abort cannot prevent suspend/resume fail.
>> Task abort (to PM requests), in real cases, is just one of many kinds
>> of failure which can fail the suspend/resume callbacks. During
>> suspend/resume, if AH8 error and/or UIC errors happen, IRQ handler
>> may complete SSU cmd with errors and schedule the error handler (I've
>> seen such scenarios in real customer cases). My idea is to treat task
>> abort (to PM requests) as a failure (let scsi_execute() return with
>> whatever error) and let error handler recover everything just like
>> any other UFS errors which invoke error handler. In case this, again,
>> goes back to the topic that is why don't just do error recovery in
>> suspend/resume, let me paste my previous reply here -
> 
> Does this mean that the IRQ handler can complete an SSU command with an
> error and that the error handler can later recover from that error?

Not exactly, sorry that I didn't put it clearly. There are cases where 
cmds
are completed with an error (either OCS is not SUCCESS or device returns
check condition in resp) and accompanied by fatal or non-fatal UIC 
errors
(UIC errors invoke UFS error handler). For example, SSU is completed 
with
OCS_MISMATCH_RESPONSE_UPIU_SIZE (whatever the reason is in HW), then 
auto
hibern8 enter (AH8 timer timeout hba->ahit is set to a very low value) 
kicks
start right after but fails with fatal UIC errors. From dmesg log, these 
all
happen at once. I've seen even more complicated cases where all kinds of 
errors
mess up together.

> That sounds completely wrong to me. The IRQ handler should never 
> complete any
> command with an error if that error could be recoverable. Instead, the
> IRQ handler should add that command to a list and leave it to the error
> handler to fail that command or to retry it.
> 
>> 2. And say we want SCSI layer to resubmit PM requests to prevent
>> suspend/resume fail, we should keep retrying the PM requests (so
>> long as error handler can recover everything successfully), meaning
>> we should give them unlimited retries (which I think is a bad idea),
>> otherwise (if they have zero retries or limited retries), in extreme
>> conditions, what may happen is that error handler can recover 
>> everything
>> successfully every time, but all these retries (say 3) still time out,
>> which block the power management for too long (retries * 60 seconds) 
>> and,
>> most important, when the last retry times out, scsi layer will anyways
>> complete the PM request (even we return DID_IMM_RETRY), then we end up
>> same - suspend/resume shall run concurrently with error handler and we
>> couldn't recover saved PM errors.
> 
> Hmm ... it is not clear to me why this behavior is considered a 
> problem?
> 

To me, task abort to PM requests does not worth being treated so 
differently,
after all suspend/resume may fail due to any kinds of UFS errors (as 
I've
explained so many times). My idea is to let PM requests fast fail (60 
seconds
has passed, a broken device maybe, we have reason to fail it since it is 
just
a passthrough req) and schedule UFS error handler, UFS error handler 
shall
proceed after suspend/resume fails out then start to recover everything 
in a
safe environment. Is this way not working?

Thanks,

Can Guo.

> What is wrong with blocking RPM while a START STOP UNIT command is 
> being
> processed? If there are UFS devices for which it takes long to process
> that command I think it is up to the vendors of these devices to fix
> these UFS devices.
> 
> Additionally, if a UFS device needs more than (retries * 60 seconds) to
> process a START STOP UNIT command, shouldn't it be marked as broken?
> 
> Thanks,
> 
> Bart.

  reply	other threads:[~2021-06-16  8:47 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10  4:43 [PATCH v3 0/9] Complementary changes for error handling Can Guo
2021-06-10  4:43 ` [PATCH v3 1/9] scsi: ufs: Differentiate status between hba pm ops and wl pm ops Can Guo
2021-06-10 11:15   ` Adrian Hunter
2021-06-11  0:53     ` Can Guo
2021-06-11 20:40   ` Bart Van Assche
2021-06-12  6:20     ` Can Guo
2021-06-16 17:50   ` Bart Van Assche
2021-06-23  1:32     ` Can Guo
2021-06-10  4:43 ` [PATCH v3 2/9] scsi: ufs: Update the return value of supplier " Can Guo
2021-06-10  4:43 ` [PATCH v3 3/9] scsi: ufs: Enable IRQ after enabling clocks in error handling preparation Can Guo
2021-06-10  4:43 ` [PATCH v3 4/9] scsi: ufs: Complete the cmd before returning in queuecommand Can Guo
2021-06-11 20:52   ` Bart Van Assche
2021-06-12  7:38     ` Can Guo
2021-06-12 15:50       ` Bart Van Assche
2021-06-13 13:30         ` Can Guo
2021-06-10  4:43 ` [PATCH v3 5/9] scsi: ufs: Simplify error handling preparation Can Guo
2021-06-10 12:30   ` Adrian Hunter
2021-06-11  3:01     ` Can Guo
2021-06-11 20:58       ` Bart Van Assche
2021-06-12  6:46         ` Can Guo
2021-06-12  9:49           ` Can Guo
2021-06-10  4:43 ` [PATCH v3 6/9] scsi: ufs: Update ufshcd_recover_pm_error() Can Guo
2021-06-10  4:43 ` [PATCH v3 7/9] scsi: ufs: Let host_sem cover the entire system suspend/resume Can Guo
2021-06-10 13:32   ` Adrian Hunter
2021-06-11  3:06     ` Can Guo
2021-06-11 21:00   ` Bart Van Assche
2021-06-12  6:46     ` Can Guo
2021-06-10  4:43 ` [PATCH v3 8/9] scsi: ufs: Update the fast abort path in ufshcd_abort() for PM requests Can Guo
2021-06-11 21:02   ` Bart Van Assche
2021-06-12  7:07     ` Can Guo
2021-06-12 16:50       ` Bart Van Assche
2021-06-13 14:42         ` Can Guo
2021-06-14 18:49           ` Bart Van Assche
2021-06-15  2:36             ` Can Guo
2021-06-15  3:17               ` Can Guo
2021-06-15 18:25               ` Bart Van Assche
2021-06-16  4:00                 ` Can Guo
2021-06-16  4:40                   ` Bart Van Assche
2021-06-16  8:47                     ` Can Guo [this message]
2021-06-16 17:55                       ` Bart Van Assche
2021-06-23  1:34                         ` Can Guo
2021-06-10  4:43 ` [PATCH v3 9/9] scsi: ufs: Apply more limitations to user access Can Guo
2021-06-11 21:03   ` Bart Van Assche
2021-06-12  7:13     ` Can Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8eadb2f2e30804faf23c9c71e5724d08@codeaurora.org \
    --to=cang@codeaurora.org \
    --cc=alim.akhtar@samsung.com \
    --cc=asutoshd@codeaurora.org \
    --cc=avri.altman@wdc.com \
    --cc=beanhuo@micron.com \
    --cc=bvanassche@acm.org \
    --cc=hongwus@codeaurora.org \
    --cc=jaegeuk@kernel.org \
    --cc=jejb@linux.ibm.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=nguyenb@codeaurora.org \
    --cc=stanley.chu@mediatek.com \
    --cc=ziqichen@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.