All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: "Zhang, GuoQing (Sam)" <GuoQing.Zhang@amd.com>
To: "Feng, Kenneth" <Kenneth.Feng@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"Koenig, Christian" <Christian.Koenig@amd.com>
Cc: "Zhang, Owen(SRDC)" <Owen.Zhang2@amd.com>,
	"Aldabagh, Maad" <Maad.Aldabagh@amd.com>,
	"Ma, Qing (Mark)" <Qing.Ma@amd.com>,
	"Li, Yunxiang (Teddy)" <Yunxiang.Li@amd.com>
Subject: Re: [PATCH 2/2] drm/amd/amdgpu: use the default reset for ras recovery
Date: Mon, 6 May 2024 08:16:59 +0000	[thread overview]
Message-ID: <DM4PR12MB5937E1C985C21BCDC3C96FE9E51C2@DM4PR12MB5937.namprd12.prod.outlook.com> (raw)
In-Reply-To: <DM4PR12MB51656C0277435B3DB92BBE608E1B2@DM4PR12MB5165.namprd12.prod.outlook.com>

[-- Attachment #1: Type: text/plain, Size: 2801 bytes --]

[AMD Official Use Only - General]

Hi @Deucher, Alexander<mailto:Alexander.Deucher@amd.com> and @Koenig, Christian<mailto:Christian.Koenig@amd.com>

Could you help review this patch?
Without this patch, when customer set `reset_method=3` modprobe param to use mode2 reset, ras recovery will also use mode2 reset and skip mode1 reset.
When ECC error happens, GPU can’t be recovered with mode2 reset and mode1 reset is skipped, this will cause GPU reset failure.

This patch is to always use mode1 reset for ras recovery (ECC error) when setting `reset_method=3`.

Thanks
Sam

From: Feng, Kenneth <Kenneth.Feng@amd.com>
Date: Monday, April 29, 2024 at 16:15
To: Feng, Kenneth <Kenneth.Feng@amd.com>, amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>, Zhang, GuoQing (Sam) <GuoQing.Zhang@amd.com>
Cc: Zhang, Owen(SRDC) <Owen.Zhang2@amd.com>, Aldabagh, Maad <Maad.Aldabagh@amd.com>, Ma, Qing (Mark) <Qing.Ma@amd.com>
Subject: RE: [PATCH 2/2] drm/amd/amdgpu: use the default reset for ras recovery
[AMD Official Use Only - General]

+@Zhang, GuoQing (Sam)

-----Original Message-----
From: Kenneth Feng <kenneth.feng@amd.com>
Sent: Monday, April 29, 2024 3:32 PM
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Owen(SRDC) <Owen.Zhang2@amd.com>; Aldabagh, Maad <Maad.Aldabagh@amd.com>; Ma, Qing (Mark) <Qing.Ma@amd.com>; Feng, Kenneth <Kenneth.Feng@amd.com>
Subject: [PATCH 2/2] drm/amd/amdgpu: use the default reset for ras recovery

use the default reset for ras recovery

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index a037e8fba29f..f92b2c4f0d5c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2437,6 +2437,7 @@ static void amdgpu_ras_do_recovery(struct work_struct *work)
        struct amdgpu_device *adev = ras->adev;
        struct list_head device_list, *device_list_handle =  NULL;
        struct amdgpu_hive_info *hive = amdgpu_get_xgmi_hive(adev);
+       int save_reset_method = amdgpu_reset_method;

        if (hive) {
                atomic_set(&hive->ras_recovery, 1);
@@ -2501,7 +2502,13 @@ static void amdgpu_ras_do_recovery(struct work_struct *work)
                        }
                }

+               if (amdgpu_gpu_recovery == 2)
+                       amdgpu_reset_method = -1;
+
                amdgpu_device_gpu_recover(ras->adev, NULL, &reset_context);
+
+               if (amdgpu_gpu_recovery == 2)
+                       amdgpu_reset_method = save_reset_method;
        }
        atomic_set(&ras->in_recovery, 0);
        if (hive) {
--
2.34.1

[-- Attachment #2: Type: text/html, Size: 7854 bytes --]

  reply	other threads:[~2024-05-06  8:17 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-29  7:31 [PATCH 1/2] drm/amd/amdgpu: customized the reset to skip soft recovery Kenneth Feng
2024-04-29  7:31 ` [PATCH 2/2] drm/amd/amdgpu: use the default reset for ras recovery Kenneth Feng
2024-04-29  8:15   ` Feng, Kenneth
2024-05-06  8:16     ` Zhang, GuoQing (Sam) [this message]
2024-05-06 19:30   ` Alex Deucher
2024-04-29  8:14 ` [PATCH 1/2] drm/amd/amdgpu: customized the reset to skip soft recovery Feng, Kenneth
2024-05-06  8:00   ` Zhang, GuoQing (Sam)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=DM4PR12MB5937E1C985C21BCDC3C96FE9E51C2@DM4PR12MB5937.namprd12.prod.outlook.com \
    --to=guoqing.zhang@amd.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Christian.Koenig@amd.com \
    --cc=Kenneth.Feng@amd.com \
    --cc=Maad.Aldabagh@amd.com \
    --cc=Owen.Zhang2@amd.com \
    --cc=Qing.Ma@amd.com \
    --cc=Yunxiang.Li@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.