Linux-EDAC Archive mirror
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: <shiju.jose@huawei.com>, <linux-cxl@vger.kernel.org>,
	<linux-acpi@vger.kernel.org>, <linux-mm@kvack.org>,
	<dave@stgolabs.net>, <dave.jiang@intel.com>,
	<alison.schofield@intel.com>, <vishal.l.verma@intel.com>,
	<ira.weiny@intel.com>, <linux-edac@vger.kernel.org>,
	<linux-kernel@vger.kernel.org>, <david@redhat.com>,
	<Vilas.Sridharan@amd.com>, <leo.duran@amd.com>,
	<Yazen.Ghannam@amd.com>, <rientjes@google.com>,
	<jiaqiyan@google.com>, <tony.luck@intel.com>, <Jon.Grimm@amd.com>,
	<dave.hansen@linux.intel.com>, <rafael@kernel.org>,
	<lenb@kernel.org>, <naoya.horiguchi@nec.com>,
	<james.morse@arm.com>, <jthoughton@google.com>,
	<somasundaram.a@hpe.com>, <erdemaktas@google.com>,
	<pgonda@google.com>, <duenwen@google.com>,
	<mike.malvestuto@intel.com>, <gthelen@google.com>,
	<wschwartz@amperecomputing.com>, <dferguson@amperecomputing.com>,
	<wbs@os.amperecomputing.com>, <nifan.cxl@gmail.com>,
	<tanxiaofei@huawei.com>, <prime.zeng@hisilicon.com>,
	<kangkang.shen@futurewei.com>, <wanghuiqiang@huawei.com>,
	<linuxarm@huawei.com>
Subject: Re: [RFC PATCH v8 05/10] cxl/memscrub: Add CXL device patrol scrub control feature
Date: Fri, 10 May 2024 12:23:25 +0100	[thread overview]
Message-ID: <20240510122325.00005e83@Huawei.com> (raw)
In-Reply-To: <663d69c61db8c_3d7b4294e0@dwillia2-mobl3.amr.corp.intel.com.notmuch>

On Thu, 9 May 2024 17:26:46 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> shiju.jose@ wrote:
> > From: Shiju Jose <shiju.jose@huawei.com>
> > 
> > CXL spec 3.1 section 8.2.9.9.11.1 describes the device patrol scrub control
> > feature. The device patrol scrub proactively locates and makes corrections
> > to errors in regular cycle.
> > 
> > Allow specifying the number of hours within which the patrol scrub must be
> > completed, subject to minimum and maximum limits reported by the device.
> > Also allow disabling scrub allowing trade-off error rates against
> > performance.
> > 
> > Register with scrub subsystem to provide scrub control attributes to the
> > user.
> > 
> > Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Signed-off-by: Shiju Jose <shiju.jose@huawei.com>  
> [..]
> > diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> > index 0c79d9ce877c..399e43463626 100644
> > --- a/drivers/cxl/mem.c
> > +++ b/drivers/cxl/mem.c
> > @@ -117,6 +117,12 @@ static int cxl_mem_probe(struct device *dev)
> >  	if (!cxlds->media_ready)
> >  		return -EBUSY;
> >  
> > +	rc = cxl_mem_patrol_scrub_init(cxlmd);
> > +	if (rc) {
> > +		dev_dbg(&cxlmd->dev, "CXL patrol scrub init failed\n");
> > +		return rc;
> > +	}  
> 
> 2 concerns:
> 
> * Why should cxl_mem_probe() fail just because this optional
>   scrub interface did not register?
> 

Flip the dev_dbg to dev_warn() and indeed carry on.

> * Why is this not located in cxl_region_probe()? If the ras2 scrub is an
>   HPA-based scrub I think CXL should do the work to interface with the scrub
>   interface at the same level. This also provides another in-kernel user
>   for all the DPA-to-HPA translation infrastructure that the CXL driver
>   contains. Pretty much the only reason the CXL driver needs to exist at
>   all is address translation, so at a minimum it seems a waste to inflict
>   more need to understand DPAs on userspace.

As you might expect this will get messy - I'm not saying it's a bad thing
to do, but complexities that come to mind include:

* Scrub is device wide (unlike RAS2 which in theory supports HPA range control)
  So if you map a given DPA range into multiple regions then the controls
  will interfere.  Maybe scrub at max rate requested for any region is fine.
* Interleave - so we'd be controlling multiple hardware scrubbers.
* Comes and goes with regions.  Do we stop scrubbing if no region?  Not sure.

My guess is break down is:
1) Component registered for each CXL mem device to handle the control + combining
   of all regions specific requests.
2) Region specific component that exposes the controls on HPA basis, and
   requests from all it's CXL mem device drivers a minimum service level.
3) Device specific scrub instance (perhaps) reflecting that some scrub may
   make sense when not yet in a region (identify bad mem etc).

So I think we will end up with a lot more layering in here, but end result
will indeed be better.

This has been going on a while, so not sure the DPA to HPA stuff was all in place
and at the time I think was still an open question of whether that should be
a userspace problem or not.  Anyhow time to adapt :)

Jonathan




  reply	other threads:[~2024-05-10 11:23 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-19 16:47 [RFC PATCH v8 00/10] ras: scrub: introduce subsystem + CXL/ACPI-RAS2 drivers shiju.jose
2024-04-19 16:47 ` [RFC PATCH v8 01/10] ras: scrub: Add scrub subsystem shiju.jose
2024-04-24 20:25   ` fan
2024-04-25 10:38     ` Shiju Jose
2024-04-25 10:15   ` Borislav Petkov
2024-04-25 18:11     ` Shiju Jose
2024-05-06 10:30       ` Borislav Petkov
2024-05-08 16:59         ` Shiju Jose
2024-05-08 17:20           ` Borislav Petkov
2024-05-08 17:44             ` Shiju Jose
2024-05-08 19:25               ` Borislav Petkov
2024-05-09  9:19                 ` Jonathan Cameron
2024-05-09 15:52                   ` Borislav Petkov
2024-05-09 20:03                     ` Borislav Petkov
2024-05-09 21:21                       ` Dan Williams
2024-05-09 21:51                         ` Borislav Petkov
2024-05-09 22:59                           ` Dan Williams
2024-05-10  9:25                             ` Borislav Petkov
2024-05-10 17:13                               ` Dan Williams
2024-05-11 10:17                                 ` Borislav Petkov
2024-05-17 11:15                                   ` Jonathan Cameron
2024-05-17 11:44                                     ` Jonathan Cameron
2024-05-21  8:06                                       ` Borislav Petkov
2024-05-22  9:40                                         ` Jonathan Cameron
2024-05-27  9:09                                           ` Borislav Petkov
2024-05-20 10:54                                   ` Shiju Jose
2024-05-20 11:58                                     ` Jonathan Cameron
2024-05-27  9:21                                       ` Borislav Petkov
2024-05-28  9:06                                         ` Jonathan Cameron
2024-06-06 16:05                                           ` Borislav Petkov
2024-05-10 13:31                     ` Jonathan Cameron
2024-05-09 21:47   ` Dan Williams
2024-05-10  9:03     ` Jonathan Cameron
2024-04-19 16:47 ` [RFC PATCH v8 02/10] cxl/mbox: Add GET_SUPPORTED_FEATURES mailbox command shiju.jose
2024-04-19 16:47 ` [RFC PATCH v8 03/10] cxl/mbox: Add GET_FEATURE " shiju.jose
2024-04-24 23:19   ` fan
2024-04-25 10:38     ` Shiju Jose
2024-04-19 16:47 ` [RFC PATCH v8 04/10] cxl/mbox: Add SET_FEATURE " shiju.jose
2024-04-25 17:26   ` fan
2024-04-19 16:47 ` [RFC PATCH v8 05/10] cxl/memscrub: Add CXL device patrol scrub control feature shiju.jose
2024-04-26 23:56   ` fan
2024-04-29 11:20     ` Shiju Jose
2024-04-29 12:21       ` Jonathan Cameron
2024-05-10  0:26   ` Dan Williams
2024-05-10 11:23     ` Jonathan Cameron [this message]
2024-04-19 16:47 ` [RFC PATCH v8 06/10] ACPICA: Add __free() based cleanup function for acpi_put_table shiju.jose
2024-04-19 18:06   ` Jonathan Cameron
2024-04-19 16:47 ` [RFC PATCH v8 07/10] platform: Add __free() based cleanup function for platform_device_put shiju.jose
2024-04-19 16:47 ` [RFC PATCH v8 08/10] ACPI:RAS2: Add ACPI RAS2 driver shiju.jose
2024-06-05 21:32   ` Daniel Ferguson
2024-04-19 16:47 ` [RFC PATCH v8 09/10] ras: scrub: Add scrub control attributes for ACPI RAS2 shiju.jose
2024-04-19 16:47 ` [RFC PATCH v8 10/10] ras: scrub: ACPI RAS2: Add memory ACPI RAS2 driver shiju.jose
2024-06-05 21:33   ` Daniel Ferguson
2024-06-07 15:46     ` Shiju Jose

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240510122325.00005e83@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=Jon.Grimm@amd.com \
    --cc=Vilas.Sridharan@amd.com \
    --cc=Yazen.Ghannam@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=dferguson@amperecomputing.com \
    --cc=duenwen@google.com \
    --cc=erdemaktas@google.com \
    --cc=gthelen@google.com \
    --cc=ira.weiny@intel.com \
    --cc=james.morse@arm.com \
    --cc=jiaqiyan@google.com \
    --cc=jthoughton@google.com \
    --cc=kangkang.shen@futurewei.com \
    --cc=lenb@kernel.org \
    --cc=leo.duran@amd.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxarm@huawei.com \
    --cc=mike.malvestuto@intel.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=nifan.cxl@gmail.com \
    --cc=pgonda@google.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=rafael@kernel.org \
    --cc=rientjes@google.com \
    --cc=shiju.jose@huawei.com \
    --cc=somasundaram.a@hpe.com \
    --cc=tanxiaofei@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=vishal.l.verma@intel.com \
    --cc=wanghuiqiang@huawei.com \
    --cc=wbs@os.amperecomputing.com \
    --cc=wschwartz@amperecomputing.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).