From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: <shiju.jose@huawei.com>, <linux-cxl@vger.kernel.org>,
<linux-acpi@vger.kernel.org>, <linux-mm@kvack.org>,
<dave@stgolabs.net>, <dave.jiang@intel.com>,
<alison.schofield@intel.com>, <vishal.l.verma@intel.com>,
<ira.weiny@intel.com>, <linux-edac@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <david@redhat.com>,
<Vilas.Sridharan@amd.com>, <leo.duran@amd.com>,
<Yazen.Ghannam@amd.com>, <rientjes@google.com>,
<jiaqiyan@google.com>, <tony.luck@intel.com>, <Jon.Grimm@amd.com>,
<dave.hansen@linux.intel.com>, <rafael@kernel.org>,
<lenb@kernel.org>, <naoya.horiguchi@nec.com>,
<james.morse@arm.com>, <jthoughton@google.com>,
<somasundaram.a@hpe.com>, <erdemaktas@google.com>,
<pgonda@google.com>, <duenwen@google.com>,
<mike.malvestuto@intel.com>, <gthelen@google.com>,
<wschwartz@amperecomputing.com>, <dferguson@amperecomputing.com>,
<wbs@os.amperecomputing.com>, <nifan.cxl@gmail.com>,
<tanxiaofei@huawei.com>, <prime.zeng@hisilicon.com>,
<kangkang.shen@futurewei.com>, <wanghuiqiang@huawei.com>,
<linuxarm@huawei.com>
Subject: Re: [RFC PATCH v8 05/10] cxl/memscrub: Add CXL device patrol scrub control feature
Date: Fri, 10 May 2024 12:23:25 +0100 [thread overview]
Message-ID: <20240510122325.00005e83@Huawei.com> (raw)
In-Reply-To: <663d69c61db8c_3d7b4294e0@dwillia2-mobl3.amr.corp.intel.com.notmuch>
On Thu, 9 May 2024 17:26:46 -0700
Dan Williams <dan.j.williams@intel.com> wrote:
> shiju.jose@ wrote:
> > From: Shiju Jose <shiju.jose@huawei.com>
> >
> > CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub control
> > feature. The device patrol scrub proactively locates and makes corrections
> > to errors in regular cycle.
> >
> > Allow specifying the number of hours within which the patrol scrub must be
> > completed, subject to minimum and maximum limits reported by the device.
> > Also allow disabling scrub allowing trade-off error rates against
> > performance.
> >
> > Register with scrub subsystem to provide scrub control attributes to the
> > user.
> >
> > Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> [..]
> > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > index 0c79d9ce877c..399e43463626 100644
> > --- a/drivers/cxl/mem.c
> > +++ b/drivers/cxl/mem.c
> > @@ -117,6 +117,12 @@ static int cxl_mem_probe(struct device *dev)
> > if (!cxlds->media_ready)
> > return -EBUSY;
> >
> > + rc = cxl_mem_patrol_scrub_init(cxlmd);
> > + if (rc) {
> > + dev_dbg(&cxlmd->dev, "CXL patrol scrub init failed\n");
> > + return rc;
> > + }
>
> 2 concerns:
>
> * Why should cxl_mem_probe() fail just because this optional
> scrub interface did not register?
>
Flip the dev_dbg to dev_warn() and indeed carry on.
> * Why is this not located in cxl_region_probe()? If the ras2 scrub is an
> HPA-based scrub I think CXL should do the work to interface with the scrub
> interface at the same level. This also provides another in-kernel user
> for all the DPA-to-HPA translation infrastructure that the CXL driver
> contains. Pretty much the only reason the CXL driver needs to exist at
> all is address translation, so at a minimum it seems a waste to inflict
> more need to understand DPAs on userspace.
As you might expect this will get messy - I'm not saying it's a bad thing
to do, but complexities that come to mind include:
* Scrub is device wide (unlike RAS2 which in theory supports HPA range control)
So if you map a given DPA range into multiple regions then the controls
will interfere. Maybe scrub at max rate requested for any region is fine.
* Interleave - so we'd be controlling multiple hardware scrubbers.
* Comes and goes with regions. Do we stop scrubbing if no region? Not sure.
My guess is break down is:
1) Component registered for each CXL mem device to handle the control + combining
of all regions specific requests.
2) Region specific component that exposes the controls on HPA basis, and
requests from all it's CXL mem device drivers a minimum service level.
3) Device specific scrub instance (perhaps) reflecting that some scrub may
make sense when not yet in a region (identify bad mem etc).
So I think we will end up with a lot more layering in here, but end result
will indeed be better.
This has been going on a while, so not sure the DPA to HPA stuff was all in place
and at the time I think was still an open question of whether that should be
a userspace problem or not. Anyhow time to adapt :)
Jonathan
next prev parent reply other threads:[~2024-05-10 11:23 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-19 16:47 [RFC PATCH v8 00/10] ras: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers shiju.jose
2024-04-19 16:47 ` [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem shiju.jose
2024-04-24 20:25 ` fan
2024-04-25 10:38 ` Shiju Jose
2024-04-25 10:15 ` Borislav Petkov
2024-04-25 18:11 ` Shiju Jose
2024-05-06 10:30 ` Borislav Petkov
2024-05-08 16:59 ` Shiju Jose
2024-05-08 17:20 ` Borislav Petkov
2024-05-08 17:44 ` Shiju Jose
2024-05-08 19:25 ` Borislav Petkov
2024-05-09 9:19 ` Jonathan Cameron
2024-05-09 15:52 ` Borislav Petkov
2024-05-09 20:03 ` Borislav Petkov
2024-05-09 21:21 ` Dan Williams
2024-05-09 21:51 ` Borislav Petkov
2024-05-09 22:59 ` Dan Williams
2024-05-10 9:25 ` Borislav Petkov
2024-05-10 17:13 ` Dan Williams
2024-05-11 10:17 ` Borislav Petkov
2024-05-17 11:15 ` Jonathan Cameron
2024-05-17 11:44 ` Jonathan Cameron
2024-05-21 8:06 ` Borislav Petkov
2024-05-22 9:40 ` Jonathan Cameron
2024-05-27 9:09 ` Borislav Petkov
2024-05-20 10:54 ` Shiju Jose
2024-05-20 11:58 ` Jonathan Cameron
2024-05-27 9:21 ` Borislav Petkov
2024-05-28 9:06 ` Jonathan Cameron
2024-06-06 16:05 ` Borislav Petkov
2024-05-10 13:31 ` Jonathan Cameron
2024-05-09 21:47 ` Dan Williams
2024-05-10 9:03 ` Jonathan Cameron
2024-04-19 16:47 ` [RFC PATCH v8 02/10] cxl/mbox: Add GET_SUPPORTED_FEATURES mailbox command shiju.jose
2024-04-19 16:47 ` [RFC PATCH v8 03/10] cxl/mbox: Add GET_FEATURE " shiju.jose
2024-04-24 23:19 ` fan
2024-04-25 10:38 ` Shiju Jose
2024-04-19 16:47 ` [RFC PATCH v8 04/10] cxl/mbox: Add SET_FEATURE " shiju.jose
2024-04-25 17:26 ` fan
2024-04-19 16:47 ` [RFC PATCH v8 05/10] cxl/memscrub: Add CXL device patrol scrub control feature shiju.jose
2024-04-26 23:56 ` fan
2024-04-29 11:20 ` Shiju Jose
2024-04-29 12:21 ` Jonathan Cameron
2024-05-10 0:26 ` Dan Williams
2024-05-10 11:23 ` Jonathan Cameron [this message]
2024-04-19 16:47 ` [RFC PATCH v8 06/10] ACPICA: Add __free() based cleanup function for acpi_put_table shiju.jose
2024-04-19 18:06 ` Jonathan Cameron
2024-04-19 16:47 ` [RFC PATCH v8 07/10] platform: Add __free() based cleanup function for platform_device_put shiju.jose
2024-04-19 16:47 ` [RFC PATCH v8 08/10] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2024-06-05 21:32 ` Daniel Ferguson
2024-04-19 16:47 ` [RFC PATCH v8 09/10] ras: scrub: Add scrub control attributes for ACPI RAS2 shiju.jose
2024-04-19 16:47 ` [RFC PATCH v8 10/10] ras: scrub: ACPI RAS2: Add memory ACPI RAS2 driver shiju.jose
2024-06-05 21:33 ` Daniel Ferguson
2024-06-07 15:46 ` Shiju Jose
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240510122325.00005e83@Huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=Jon.Grimm@amd.com \
--cc=Vilas.Sridharan@amd.com \
--cc=Yazen.Ghannam@amd.com \
--cc=alison.schofield@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=dferguson@amperecomputing.com \
--cc=duenwen@google.com \
--cc=erdemaktas@google.com \
--cc=gthelen@google.com \
--cc=ira.weiny@intel.com \
--cc=james.morse@arm.com \
--cc=jiaqiyan@google.com \
--cc=jthoughton@google.com \
--cc=kangkang.shen@futurewei.com \
--cc=lenb@kernel.org \
--cc=leo.duran@amd.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxarm@huawei.com \
--cc=mike.malvestuto@intel.com \
--cc=naoya.horiguchi@nec.com \
--cc=nifan.cxl@gmail.com \
--cc=pgonda@google.com \
--cc=prime.zeng@hisilicon.com \
--cc=rafael@kernel.org \
--cc=rientjes@google.com \
--cc=shiju.jose@huawei.com \
--cc=somasundaram.a@hpe.com \
--cc=tanxiaofei@huawei.com \
--cc=tony.luck@intel.com \
--cc=vishal.l.verma@intel.com \
--cc=wanghuiqiang@huawei.com \
--cc=wbs@os.amperecomputing.com \
--cc=wschwartz@amperecomputing.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).