Regressions List Tracking
 help / color / mirror / Atom feed
From: "Nícolas F. R. A. Prado" <nfraprado@collabora.com>
To: regressions@lists.linux.dev
Cc: linux-kernel@vger.kernel.org, kernel@collabora.com
Subject: Re: [REGRESSION] boot regression on linux-next on sc7180 platforms - null pointer dereference on iommu_dma_sync_sg_for_device
Date: Wed, 15 May 2024 17:05:53 -0400	[thread overview]
Message-ID: <d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano> (raw)
In-Reply-To: <6d0da90d-f405-4b5d-b7f9-238cc9405ebe@notapiano>

On Tue, May 14, 2024 at 12:41:29PM -0400, Nícolas F. R. A. Prado wrote:
> Hi,
> 
> KernelCI has identified a new boot regression on linux-next. It affects the
> following platforms:
> * sc7180-trogdor-kingoftown
> * sc7180-trogdor-lazor-limozeen
> 
> The regression was introduced in next-20240509, and still affects today's
> (next-20240514) release.
> 
> The config used was the upstream arm64 defconfig with a config fragment on top
> [1].
> 
> The following stack traces are produced during boot and a usable shell is never
> reached:
> 
> [    0.381981] Unable to handle kernel NULL pointer dereference at virtual address 000000000000001c
> [    0.381989] Mem abort info:
> [    0.381991]   ESR = 0x0000000096000004
> [    0.381994]   EC = 0x25: DABT (current EL), IL = 32 bits
> [    0.381997]   SET = 0, FnV = 0
> [    0.382000]   EA = 0, S1PTW = 0
> [    0.382003]   FSC = 0x04: level 0 translation fault
> [    0.382006] Data abort info:
> [    0.382008]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
> [    0.382011]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> [    0.382014]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> [    0.382017] [000000000000001c] user address but active_mm is swapper
> [    0.382021] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
> [    0.382025] Modules linked in:
> [    0.382032] CPU: 4 PID: 68 Comm: kworker/u32:2 Not tainted 6.9.0-next-20240514-dirty #380
> [    0.382038] Hardware name: Google Kingoftown (DT)
> [    0.382042] Workqueue: async async_run_entry_fn
> [    0.382055] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [    0.382061] pc : iommu_dma_sync_sg_for_device+0x28/0x100
> [    0.382070] lr : __dma_sync_sg_for_device+0x28/0x4c
> [    0.382080] sp : ffff800080943740
> [    0.382082] x29: ffff800080943740 x28: ffff36ee44280000 x27: ffff36ee40bd7810
> [    0.382092] x26: ffff800080943998 x25: ffff36ee44280480 x24: ffffb54600bcf0e8
> [    0.382101] x23: ffff36ee40bd7810 x22: 0000000000000001 x21: 0000000000000000
> [    0.382110] x20: ffffb54600f3d098 x19: 0000000000000000 x18: ffffb54601c1a210
> [    0.382118] x17: 000000040044ffff x16: 0000000000000000 x15: ffff36efb6d95580
> [    0.382126] x14: ffff36ee409156c0 x13: 0000000000001797 x12: 0000000000000002
> [    0.382134] x11: 0000000000000004 x10: ffff36ee4308b3d8 x9 : ffff36ee44280469
> [    0.382143] x8 : ffff36ee4308b304 x7 : 00000000ffffffff x6 : 0000000000000001
> [    0.382151] x5 : ffffb5460033a740 x4 : ffffb545ff50375c x3 : 0000000000000001
> [    0.382159] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff36ee40bd7810
> [    0.382167] Call trace:
> [    0.382170]  iommu_dma_sync_sg_for_device+0x28/0x100
> [    0.382176]  __dma_sync_sg_for_device+0x28/0x4c
> [    0.382183]  spi_transfer_one_message+0x378/0x6e4
> [    0.382193]  __spi_pump_transfer_message+0x190/0x4a4
> [    0.382199]  __spi_sync+0x2a0/0x3c4
> [    0.382205]  spi_sync_locked+0x10/0x1c
> [    0.382211]  tpm_tis_spi_transfer_full+0x160/0x2fc
> [    0.382217]  tpm_tis_spi_transfer+0x34/0x40
> [    0.382221]  tpm_tis_spi_cr50_read_bytes+0x5c/0x90
> [    0.382226]  tpm_tis_core_init+0xfc/0x7e0
> [    0.382231]  tpm_tis_spi_init+0x54/0x70
> [    0.382236]  cr50_spi_probe+0xf4/0x27c
> [    0.382241]  tpm_tis_spi_driver_probe+0x34/0x64
> [    0.382245]  spi_probe+0x84/0xe4
> [    0.382251]  really_probe+0xbc/0x2a0
> [    0.382258]  __driver_probe_device+0x78/0x12c
> [    0.382264]  driver_probe_device+0x40/0x160
> [    0.382269]  __device_attach_driver+0xb8/0x134
> [    0.382275]  bus_for_each_drv+0x84/0xe0
> [    0.382280]  __device_attach_async_helper+0xac/0xd0
> [    0.382286]  async_run_entry_fn+0x34/0xe0
> [    0.382291]  process_one_work+0x154/0x298
> [    0.382300]  worker_thread+0x304/0x408
> [    0.382307]  kthread+0x118/0x11c
> [    0.382313]  ret_from_fork+0x10/0x20
> [    0.382324] Code: 2a0203f5 2a0303f6 a90363f7 aa0003f7 (b9401c20)
> [    0.382328] ---[ end trace 0000000000000000 ]---

Tracked down the issue to commit 8cc3bad9d9d6 ("spi: Remove unneded check for
orig_nents").

#regzbot introduced: 8cc3bad9d9d6

The issue happens because in spi_dma_sync_for_device(), the line

	dma_sync_sgtable_for_device(tx_dev, &xfer->tx_sg, DMA_TO_DEVICE);

is passing a scatterlist table (xfer->tx_sg) that hasn't been initialized, so
the sgl pointer inside it is null. Before the patch, the check would prevent it
from being called since orig_nents is 0.

This initialization of the scatterlist table should happen in
spi_map_buf_attrs(), which should be called in __spi_map_msg(), however some
debugging revealed that the previous check

		if (!ctlr->can_dma(ctlr, msg->spi, xfer))
	
is failing, which is why the scatterlist table isn't initialized. Despite that,
ctlr->cur_msg_mapped is set to true, so spi_dma_sync_for_device() doesn't return
early, which would completely avoid the issue. At this point I'm confused why
that flag tracks DMA mapping per message, if the mapping is done per transfer
(and a message can contain multiple transfers). Maybe that's what needs to
change, though I'd like the input from someone who is familiar with this code.

Thanks,
Nícolas

  reply	other threads:[~2024-05-15 21:05 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-14 16:41 [REGRESSION] boot regression on linux-next on sc7180 platforms - null pointer dereference on iommu_dma_sync_sg_for_device Nícolas F. R. A. Prado
2024-05-15 21:05 ` Nícolas F. R. A. Prado [this message]
2024-05-16 21:32   ` Nícolas F. R. A. Prado
2024-05-22 10:00 ` Neil Armstrong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d3679496-2e4e-4a7c-97ed-f193bd53af1d@notapiano \
    --to=nfraprado@collabora.com \
    --cc=kernel@collabora.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).