Re: [PATCH v3 5/9] scsi: ufs: Simplify error handling preparation

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

From: Can Guo <cang@codeaurora.org>
To: Adrian Hunter <adrian.hunter@intel.com>
Cc: asutoshd@codeaurora.org, nguyenb@codeaurora.org,
	hongwus@codeaurora.org, ziqichen@codeaurora.org,
	linux-scsi@vger.kernel.org, kernel-team@android.com,
	Alim Akhtar <alim.akhtar@samsung.com>,
	Avri Altman <avri.altman@wdc.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Stanley Chu <stanley.chu@mediatek.com>,
	Bean Huo <beanhuo@micron.com>, Jaegeuk Kim <jaegeuk@kernel.org>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 5/9] scsi: ufs: Simplify error handling preparation
Date: Fri, 11 Jun 2021 11:01:12 +0800	[thread overview]
Message-ID: <f0ae504bccc428fa674a183608174bdd@codeaurora.org> (raw)
In-Reply-To: <6abb81f6-4dd2-082e-9440-4b549f105788@intel.com>

Hi Adrian,

On 2021-06-10 20:30, Adrian Hunter wrote:
> On 10/06/21 7:43 am, Can Guo wrote:
>> Commit cb7e6f05fce67c965194ac04467e1ba7bc70b069 ("scsi: ufs: core: 
>> Enable
>> power management for wlun") moves UFS operations out of 
>> ufshcd_resume(), so
>> in error handling preparation, if ufshcd hba has failed to resume, 
>> there is
>> no point to re-enable IRQ/clk/pwr.
> 
> I am not sure how cb7e6f05fce67c965194ac04467e1ba7bc70b069 made things 
> any
> different,

Previously, without commit cb7e6f05fce67c965194ac04467e1ba7bc70b069, 
ufshcd_resume()
may turn off pwr and clk due to UFS error, e.g., link transition failure 
and SSU
error/abort (and these UFS error would invoke error handling).  When 
error handling
kicks start, it should re-enable the pwr and clk before proceeding. Now, 
commit
cb7e6f05fce67c965194ac04467e1ba7bc70b069 makes ufshcd_resume() purely 
control pwr and
clk, meaning if ufshcd_resume() fails, there is nothing we can do about 
it - pwr or
clk enabling must have failed, and it is not because of UFS error. This 
is why I am
removing the re-enabling pwr/clk in error handling prepare.

> but what I really wonder is why we don't just do recovery
> directly in __ufshcd_wl_suspend() and  __ufshcd_wl_resume() and strip 
> all
> the PM complexity out of ufshcd_err_handling()?
> 

This is a good question and I've been strugled with this idea ever since 
I
started to fix error handling.

Just so you know, there are runtime and system suspend/resume. And error
handling has the same nature of user access - it is unpredictable, 
meaning it
can be invoked at any time (from IRQ handler), even when there is no 
ongoing
cmd/data transactions (like auto hibern8 failure and UIC errors, such as 
DME
error and some errors in data link layer) [1], unless you disable UFS 
IRQ.

For runtime suspend/resume, it is fine, since we call 
pm_runtime_get/put_sync() in
error handling - error handling won't run into parallel with runtime 
suspend/resume.

For system suspend/resume, since error handling has the same nature like 
user
access, so we are using host_sem to avoid concurrency of error handling 
and
system suspend/resume.

Back to your question - can we just do recovery directly in 
__ufshcd_wl_suspend()
and __ufshcd_wl_resume()? Yes, we can.

However, the reasons why I choose not to do it that way are (althrough 
error
handler prepare has became much more simple after apply this change)

1. I want to keep all the complexity within error handler, and re-direct 
all error
recovery needs to error handler. It can avoid calling 
ufshcd_reset_and_restore()
and/or flush_work(&hba->eh_work) here and there. The entire UFS 
suspend/resume is
already complex enough, I don't want to mess up with it.

2. We do explicit recovery only when we see certain errors, e.g., H8 
enter func
returns an error during suspend, but as mentioned above [1], error 
handling can
be invoked already from IRQ handler (due to all kinds of UIC errors 
before H8 enter
func returns). So, we still need host_sem (in case of system 
suspend/resume) to
avoid concurrency.

3. During system suspend/resume, error handling can be invoked (due to 
non-fatal
errors) but still UFS cmds return no error at all. Similar like above, 
we need
host_sem to avoid concurrency.

There are more reasons why I chose this way, but it is really this way 
or others.
I am glad to see someone cares about error handling and can make it 
better and
more robust, no matter what that way is. :)

Thanks,
Can Guo.

>> 
>> Signed-off-by: Can Guo <cang@codeaurora.org>
>> ---
>>  drivers/scsi/ufs/ufshcd.c | 58 
>> +++++++++++++++++++++++++----------------------
>>  1 file changed, 31 insertions(+), 27 deletions(-)
>> 
>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>> index 7dc0fda..0afad6b 100644
>> --- a/drivers/scsi/ufs/ufshcd.c
>> +++ b/drivers/scsi/ufs/ufshcd.c
>> @@ -2727,8 +2727,8 @@ static int ufshcd_queuecommand(struct Scsi_Host 
>> *host, struct scsi_cmnd *cmd)
>>  		break;
>>  	case UFSHCD_STATE_EH_SCHEDULED_FATAL:
>>  		/*
>> -		 * pm_runtime_get_sync() is used at error handling preparation
>> -		 * stage. If a scsi cmd, e.g. the SSU cmd, is sent from hba's
>> +		 * ufshcd_rpm_get_sync() is used at error handling preparation
>> +		 * stage. If a scsi cmd, e.g., the SSU cmd, is sent from the
>>  		 * PM ops, it can never be finished if we let SCSI layer keep
>>  		 * retrying it, which gets err handler stuck forever. Neither
>>  		 * can we let the scsi cmd pass through, because UFS is in bad
>> @@ -5915,29 +5915,26 @@ static void ufshcd_clk_scaling_suspend(struct 
>> ufs_hba *hba, bool suspend)
>>  	}
>>  }
>> 
>> -static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
>> +static int ufshcd_err_handling_prepare(struct ufs_hba *hba)
>>  {
>> +	/*
>> +	 * Exclusively call pm_runtime_get_sync(hba->dev) once, in case
>> +	 * following ufshcd_rpm_get_sync() fails.
>> +	 */
>> +	pm_runtime_get_sync(hba->dev);
>> +	/* End of the world. */
>> +	if (pm_runtime_suspended(hba->dev)) {
>> +		pm_runtime_put(hba->dev);
>> +		return -EINVAL;
>> +	}
>> +
>> +	ufshcd_set_eh_in_progress(hba);
>>  	ufshcd_rpm_get_sync(hba);
>> -	if (pm_runtime_status_suspended(&hba->sdev_ufs_device->sdev_gendev) 
>> ||
>> +	if (pm_runtime_suspended(&hba->sdev_ufs_device->sdev_gendev) ||
>>  	    hba->is_wl_sys_suspended) {
>> -		enum ufs_pm_op pm_op;
>> +		enum ufs_pm_op pm_op = hba->is_wl_sys_suspended ?
>> +				       UFS_SYSTEM_PM : UFS_RUNTIME_PM;
>> 
>> -		/*
>> -		 * Don't assume anything of resume, if
>> -		 * resume fails, irq and clocks can be OFF, and powers
>> -		 * can be OFF or in LPM.
>> -		 */
>> -		ufshcd_setup_hba_vreg(hba, true);
>> -		ufshcd_setup_vreg(hba, true);
>> -		ufshcd_config_vreg_hpm(hba, hba->vreg_info.vccq);
>> -		ufshcd_config_vreg_hpm(hba, hba->vreg_info.vccq2);
>> -		ufshcd_hold(hba, false);
>> -		if (!ufshcd_is_clkgating_allowed(hba)) {
>> -			ufshcd_setup_clocks(hba, true);
>> -			ufshcd_enable_irq(hba);
>> -		}
>> -		ufshcd_release(hba);
>> -		pm_op = hba->is_wl_sys_suspended ? UFS_SYSTEM_PM : UFS_RUNTIME_PM;
>>  		ufshcd_vops_resume(hba, pm_op);
>>  	} else {
>>  		ufshcd_hold(hba, false);
>> @@ -5951,22 +5948,25 @@ static void ufshcd_err_handling_prepare(struct 
>> ufs_hba *hba)
>>  	down_write(&hba->clk_scaling_lock);
>>  	up_write(&hba->clk_scaling_lock);
>>  	cancel_work_sync(&hba->eeh_work);
>> +	return 0;
>>  }
>> 
>>  static void ufshcd_err_handling_unprepare(struct ufs_hba *hba)
>>  {
>> +	ufshcd_clear_eh_in_progress(hba);
>>  	ufshcd_scsi_unblock_requests(hba);
>>  	ufshcd_release(hba);
>>  	if (ufshcd_is_clkscaling_supported(hba))
>>  		ufshcd_clk_scaling_suspend(hba, false);
>>  	ufshcd_clear_ua_wluns(hba);
>>  	ufshcd_rpm_put(hba);
>> +	pm_runtime_put(hba->dev);
>>  }
>> 
>>  static inline bool ufshcd_err_handling_should_stop(struct ufs_hba 
>> *hba)
>>  {
>>  	return (!hba->is_powered || hba->shutting_down ||
>> -		!hba->sdev_ufs_device ||
>> +		!hba->sdev_ufs_device || hba->is_sys_suspended ||
>>  		hba->ufshcd_state == UFSHCD_STATE_ERROR ||
>>  		(!(hba->saved_err || hba->saved_uic_err || hba->force_reset ||
>>  		   ufshcd_is_link_broken(hba))));
>> @@ -6052,9 +6052,13 @@ static void ufshcd_err_handler(struct 
>> work_struct *work)
>>  		up(&hba->host_sem);
>>  		return;
>>  	}
>> -	ufshcd_set_eh_in_progress(hba);
>>  	spin_unlock_irqrestore(hba->host->host_lock, flags);
>> -	ufshcd_err_handling_prepare(hba);
>> +	if (ufshcd_err_handling_prepare(hba)) {
>> +		dev_err(hba->dev, "%s: error handling preparation failed\n",
>> +				__func__);
>> +		up(&hba->host_sem);
>> +		return;
>> +	}
>>  	/* Complete requests that have door-bell cleared by h/w */
>>  	ufshcd_complete_requests(hba);
>>  	spin_lock_irqsave(hba->host->host_lock, flags);
>> @@ -6198,7 +6202,6 @@ static void ufshcd_err_handler(struct 
>> work_struct *work)
>>  			dev_err_ratelimited(hba->dev, "%s: exit: saved_err 0x%x 
>> saved_uic_err 0x%x",
>>  			    __func__, hba->saved_err, hba->saved_uic_err);
>>  	}
>> -	ufshcd_clear_eh_in_progress(hba);
>>  	spin_unlock_irqrestore(hba->host->host_lock, flags);
>>  	ufshcd_err_handling_unprepare(hba);
>>  	up(&hba->host_sem);
>> @@ -8999,6 +9002,9 @@ static int __ufshcd_wl_resume(struct ufs_hba 
>> *hba, enum ufs_pm_op pm_op)
>> 
>>  	/* Enable Auto-Hibernate if configured */
>>  	ufshcd_auto_hibern8_enable(hba);
>> +
>> +	hba->clk_gating.is_suspended = false;
>> +	ufshcd_release(hba);
>>  	goto out;
>> 
>>  set_old_link_state:
>> @@ -9008,8 +9014,6 @@ static int __ufshcd_wl_resume(struct ufs_hba 
>> *hba, enum ufs_pm_op pm_op)
>>  out:
>>  	if (ret)
>>  		ufshcd_update_evt_hist(hba, UFS_EVT_WL_RES_ERR, (u32)ret);
>> -	hba->clk_gating.is_suspended = false;
>> -	ufshcd_release(hba);
>>  	hba->wl_pm_op_in_progress = false;
>>  	return ret <= 0 ? ret : -EINVAL;
>>  }
>>

next prev parent reply	other threads:[~2021-06-11  3:01 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10  4:43 [PATCH v3 0/9] Complementary changes for error handling Can Guo
2021-06-10  4:43 ` [PATCH v3 1/9] scsi: ufs: Differentiate status between hba pm ops and wl pm ops Can Guo
2021-06-10 11:15   ` Adrian Hunter
2021-06-11  0:53     ` Can Guo
2021-06-11 20:40   ` Bart Van Assche
2021-06-12  6:20     ` Can Guo
2021-06-16 17:50   ` Bart Van Assche
2021-06-23  1:32     ` Can Guo
2021-06-10  4:43 ` [PATCH v3 2/9] scsi: ufs: Update the return value of supplier " Can Guo
2021-06-10  4:43 ` [PATCH v3 3/9] scsi: ufs: Enable IRQ after enabling clocks in error handling preparation Can Guo
2021-06-10  4:43 ` [PATCH v3 4/9] scsi: ufs: Complete the cmd before returning in queuecommand Can Guo
2021-06-11 20:52   ` Bart Van Assche
2021-06-12  7:38     ` Can Guo
2021-06-12 15:50       ` Bart Van Assche
2021-06-13 13:30         ` Can Guo
2021-06-10  4:43 ` [PATCH v3 5/9] scsi: ufs: Simplify error handling preparation Can Guo
2021-06-10 12:30   ` Adrian Hunter
2021-06-11  3:01     ` Can Guo [this message]
2021-06-11 20:58       ` Bart Van Assche
2021-06-12  6:46         ` Can Guo
2021-06-12  9:49           ` Can Guo
2021-06-10  4:43 ` [PATCH v3 6/9] scsi: ufs: Update ufshcd_recover_pm_error() Can Guo
2021-06-10  4:43 ` [PATCH v3 7/9] scsi: ufs: Let host_sem cover the entire system suspend/resume Can Guo
2021-06-10 13:32   ` Adrian Hunter
2021-06-11  3:06     ` Can Guo
2021-06-11 21:00   ` Bart Van Assche
2021-06-12  6:46     ` Can Guo
2021-06-10  4:43 ` [PATCH v3 8/9] scsi: ufs: Update the fast abort path in ufshcd_abort() for PM requests Can Guo
2021-06-11 21:02   ` Bart Van Assche
2021-06-12  7:07     ` Can Guo
2021-06-12 16:50       ` Bart Van Assche
2021-06-13 14:42         ` Can Guo
2021-06-14 18:49           ` Bart Van Assche
2021-06-15  2:36             ` Can Guo
2021-06-15  3:17               ` Can Guo
2021-06-15 18:25               ` Bart Van Assche
2021-06-16  4:00                 ` Can Guo
2021-06-16  4:40                   ` Bart Van Assche
2021-06-16  8:47                     ` Can Guo
2021-06-16 17:55                       ` Bart Van Assche
2021-06-23  1:34                         ` Can Guo
2021-06-10  4:43 ` [PATCH v3 9/9] scsi: ufs: Apply more limitations to user access Can Guo
2021-06-11 21:03   ` Bart Van Assche
2021-06-12  7:13     ` Can Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f0ae504bccc428fa674a183608174bdd@codeaurora.org \
    --to=cang@codeaurora.org \
    --cc=adrian.hunter@intel.com \
    --cc=alim.akhtar@samsung.com \
    --cc=asutoshd@codeaurora.org \
    --cc=avri.altman@wdc.com \
    --cc=beanhuo@micron.com \
    --cc=hongwus@codeaurora.org \
    --cc=jaegeuk@kernel.org \
    --cc=jejb@linux.ibm.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=nguyenb@codeaurora.org \
    --cc=stanley.chu@mediatek.com \
    --cc=ziqichen@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.