QEMU-Devel Archive mirror
 help / color / mirror / Atom feed
From: Jonathan Cameron via <qemu-devel@nongnu.org>
To: Yuquan Wang <wangyuquan1236@phytium.com.cn>
Cc: <qemu-devel@nongnu.org>, <linux-cxl@vger.kernel.org>,
	<dan.williams@intel.com>
Subject: Re: CXL numa error on arm64 qemu virt machine
Date: Fri, 17 May 2024 19:03:56 +0100	[thread overview]
Message-ID: <20240517190356.0000582a@Huawei.com> (raw)
In-Reply-To: <20240517111441.00002279@Huawei.com>

On Fri, 17 May 2024 11:14:41 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Fri, 17 May 2024 18:07:07 +0800
> Yuquan Wang <wangyuquan1236@phytium.com.cn> wrote:
> 
> > On Fri, May 10, 2024 at 06:16:46PM +0100, Jonathan Cameron wrote:  
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/jic23/cxl-staging.git/log/?h=arm-numa-fixes
> > >     
> > Thank you :)  
> > > I've run out of time to sort out cover letters and things + just before the merge
> > > window is never a good time get anyone to pay attention to potentially controversial
> > > patches.  So for now I've thrown up a branch on kernel.org with Robert's
> > > series of fixes of related code (that's queued in the ACPI tree for the merge window)
> > > and Dan Williams (from several years ago) + my additions that 'work' (lightly tested)
> > > on qemu/arm64 with the generic port patches etc. 
> > > 
> > > I'll send out an RFC in a couple of weeks.  In meantime let me know if you
> > > run into any problems or have suggestions to improve them.
> > > 
> > > Jonathan
> > >    
> > With the latest commit(d077bf9) in the 'arm-numa-fixes', the qemu virt
> > could create a cxl region with a new numa node (node 2) just like x86.
> > At this stage(the first time to create cxl region), everything works
> > fine.
> > 
> > However, if I use below commands to delete the created cxl region:
> > 
> > `daxctl offline-memory dax0.0`
> > `cxl disable-region region0`
> > `cxl destroy-region region0`
> > 
> > and then recreate it by `cxl create-region -d decoder0.0 -t ram`, the
> > kernel could not create the numa node2 again, and the kernel will print:
> > 
> > [  589.458971] Fallback order for Node 0: 0 1
> > [  589.459136] Fallback order for Node 1: 1 0
> > [  589.459175] Fallback order for Node 2: 0 1
> > [  589.459213] Built 2 zonelists, mobility grouping on.  Total pages: 1009890
> > [  589.459284] Policy zone: Normal  
> 
> I'll see if I can figure out what is happening there.

So I know what is happening but not sure on the solution yet.
The issue is on unbind of the region there is a call to try_remove_memory()
and that calls memblock_phys_free(). That removes the reserved memblocks being used
for tracking the numa node, so when you bind a region at that HPA again, there
is no tracking information.

So far I haven't figured out why that call is there in the first place
which isn't helping me solve this.

https://elixir.bootlin.com/linux/v6.9.1/source/mm/memory_hotplug.c#L2286

Until I get this code out there, kind of hard to ask the mm folk
- for now I may just have to say it only works once and point at that
line as the problem in an RFC.

Long shot, but Dan, did you run into this when you were doing your 
[PATCH v2 08/22] memblock: Introduce a generic phys_addr_to_target_node()
stuff?  I assume that ultimately called try_remove_memory() in a remove
path somewhere and similarly to this if you try putting it back it
would be missing.  Or alternatively, any idea why what that memblock_phys_free()
is balancing with?

Jonathan




> > 
> > Meanwhile, the qemu reports that: 
> > 
> > "qemu-system-aarch64: virtio: bogus descriptor or out of resources"  
> 
> That sounds like another TCG issue, or possibly the DMA bounce buffer
> problem resurfacing.  It's not directly related to his NUMA aspect unless
> something very odd is going on.  I'm even more confused because I think
> you are not using kmem with the above commands, so we shouldn't be using
> the CXL memory for virtio.
> 
> Just to check, you aren't running with KVM I hope?  That opens a much
> bigger problem set. :(
> 
> Jonathan
> 
> 
> 
> > 
> > Many thanks
> > Yuquan
> >   
> 
> 



      reply	other threads:[~2024-05-17 18:04 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-08  8:00 CXL numa error on arm64 qemu virt machine Yuquan Wang
2024-05-08 12:02 ` Jonathan Cameron via
2024-05-09  8:35   ` Yuquan Wang
2024-05-10 17:16     ` Jonathan Cameron via
2024-05-17 10:07       ` Yuquan Wang
2024-05-17 10:14         ` Jonathan Cameron via
2024-05-17 18:03           ` Jonathan Cameron via [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240517190356.0000582a@Huawei.com \
    --to=qemu-devel@nongnu.org \
    --cc=Jonathan.Cameron@Huawei.com \
    --cc=dan.williams@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=wangyuquan1236@phytium.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).