From: Lior Weintraub <liorw@pliops.com>
To: Dirk Behme <dirk.behme@gmail.com>,
"linux-embedded@vger.kernel.org" <linux-embedded@vger.kernel.org>
Subject: RE: Debugging early SError exception
Date: Thu, 21 Dec 2023 10:04:41 +0000 [thread overview]
Message-ID: <PR3P195MB0555AA259A8E616B6C5BA823C395A@PR3P195MB0555.EURP195.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <b139c136-0417-4ac5-b99a-ba999d7418a0@gmail.com>
Thanks Dirk,
Regarding the earlyprintk, not sure I know how to make it work.
I have defined CONFIG_EARLY_PRINTK=y and CONFIG_DEBUG_LL=y on my config but it doesn't seem to work.
Do I need to pass something in the bootargs from the U-BOOT?
Do I need to add that into my device tree?
(Tried to set bootargs = "console=ttyS0,115200 earlyprintk"; under "chosen" on my DT but it didn't work)
The UART I am using is "snps,dw-apb-uart".
Last week, to output the early logs I have implemented this hack:
1. Modify printk macro to run my print_func
2. This print_func wrote the characters into a single global variable (u32 simul_uart;)
3. Get the address location of this global variable and extract all writes to it from the Tarmac logs.
This is a very slow and tedious process but it helped me identify the initial SError.
Initially I thought I can write directly into the UART FIFO register (which I know the address) but this didn't work because Linux already setup the MMU so I guess I need to know the virtual address of this FIFO.
Do I need to use __phys_to_virt of some sort?
Cheers,
Lior.
> -----Original Message-----
> From: Dirk Behme <dirk.behme@gmail.com>
> Sent: Thursday, December 21, 2023 10:30 AM
> To: Lior Weintraub <liorw@pliops.com>; linux-embedded@vger.kernel.org
> Subject: Re: Debugging early SError exception
>
> [You don't often get email from dirk.behme@gmail.com. Learn why this is
> important at https://aka.ms/LearnAboutSenderIdentification ]
>
> CAUTION: External Sender
>
> Am 21.12.23 um 08:43 schrieb Lior Weintraub:
> > Hi Dirk,
> >
> > We found that the issue was at the early stages of Barebox (a.k.a U-BOOT
> v2).
>
> Glad to hear that! :)
>
> > Our implementation of putc_ll (on debug_ll) was writing into the UART Tx
> FIFO without checking if the FIFO is full.
> > Once the fifo got full it caused this SError probably because the UART IP
> generated an apberror signal.
>
> Thanks for the report!
>
> > Now the Linux is running and doesn't report the SError again but now we
> face another issue.
> > We see that the PC is getting into a "report_bug" function.
> > The Linux doesn't print anything to the UART (probably since it hasn't got to
> the point where the console is configured?).
>
> For cases like this using earlyprintk is usually a good option. Check
> the Linux kernel serial console (UART) dirver of you SoC if it
> supports it. In the end it should be "just" a function in the serial
> console driver which outputs the console data via polling before
> (later) the interrupt driven console part takes over.
>
> Best regards
>
> Dirk
>
>
> > Since our debug means are limited it can take some time to find the root
> cause.
> >
> > I will keep you posted and update our findings.
> > Love to hear your thoughts,
> >
> > Cheers,
> > Lior.
> >
> >
> >> -----Original Message-----
> >> From: Dirk Behme <dirk.behme@gmail.com>
> >> Sent: Tuesday, December 19, 2023 3:37 PM
> >> To: Lior Weintraub <liorw@pliops.com>; linux-embedded@vger.kernel.org
> >> Subject: Re: Debugging early SError exception
> >>
> >> [You don't often get email from dirk.behme@gmail.com. Learn why this is
> >> important at https://aka.ms/LearnAboutSenderIdentification ]
> >>
> >> CAUTION: External Sender
> >>
> >> Am 19.12.23 um 14:23 schrieb Lior Weintraub:
> >>> Thanks Dirk,
> >>
> >> Welcome :)
> >>
> >> In case you find the root cause it would be nice to get some generic
> >> description of it so that we can learn something :)
> >>
> >> Best regards
> >>
> >> Dirk
> >>
> >>
> >>>> -----Original Message-----
> >>>> From: Dirk Behme <dirk.behme@gmail.com>
> >>>> Sent: Tuesday, December 19, 2023 9:09 AM
> >>>> To: Lior Weintraub <liorw@pliops.com>; linux-
> embedded@vger.kernel.org
> >>>> Subject: Re: Debugging early SError exception
> >>>>
> >>>> [You don't often get email from dirk.behme@gmail.com. Learn why this
> is
> >>>> important at https://aka.ms/LearnAboutSenderIdentification ]
> >>>>
> >>>> CAUTION: External Sender
> >>>>
> >>>> Am 17.12.23 um 22:32 schrieb Lior Weintraub:
> >>>>> Hi,
> >>>>>
> >>>>> We have a new SoC with eLinux porting (kernel v6.5).
> >>>>> This SoC is ARM64 (A53) single core based device.
> >>>>> It runs correctly on QEMU but fails with SError on emulation platform
> >>>> (Synopsys Zebu running our SoC model).
> >>>>> There is no debugger connected to this emulation but there are several
> >>>> debug capabilities we can use:
> >>>>> 1. Generating wave dump of CPU signals
> >>>>> 2. Generate a Tarmac log
> >>>>> 3. UART
> >>>>>
> >>>>> Since the SError happens at early stages of Linux boot the UART is not
> >>>> enabled yet.
> >>>>> From the Tarmac log we can see:
> >>>>> 3824884521 ps ES (ffff800080760888:d65f03c0) O el1h_ns: ret
> >>>> (parse_early_param)
> >>>>> 3824884522 ps ES (ffff800080763a60:d2801800) O el1h_ns: mov
> >> x0,
> >>>> #0xc0 // #192 (setup_arch)
> >>>>> R X0 (AARCH64) 00000000 000000c0
> >>>>> 3824884523 ps ES (ffff800080763a64:d51b4220) O el1h_ns: msr
> >>>> daif, x0 (setup_arch)
> >>>>> R CPSR 600000c5
> >>>>> 3824884529 ps ES System Error (Abort)
> >>>>> EXC [0x380] SError/vSError Current EL with SP_ELx
> >>>>> R ESR_EL1 (AARCH64) bf000002
> >>>>> R CPSR 600003c5
> >>>>> R SPSR_EL1 (AARCH64) 600000c5
> >>>>> R ELR_EL1 (AARCH64) ffff8000 80763a68
> >>>>> 3824884925 ps ES (ffff800080010b80:d10543ff) O el1h_ns: sub
> >> sp,
> >>>> sp, #0x150 (vectors)
> >>>>> R SP_EL1 (AARCH64) ffff8000 808f3c50
> >>>>> 3824884925 ps ES (ffff800080010b84:8b2063ff) O el1h_ns: add
> >> sp,
> >>>> sp, x0 (vectors)
> >>>>> R SP_EL1 (AARCH64) ffff8000 808f3d10
> >>>>> 3824884926 ps ES (ffff800080010b88:cb2063e0) O el1h_ns: sub
> >> x0,
> >>>> sp, x0 (vectors)
> >>>>> R X0 (AARCH64) ffff8000 808f3c50
> >>>>> 3824884927 ps ES (ffff800080010b8c:37700080) O el1h_ns: tbnz
> >> w0,
> >>>> #14, ffff800080010b9c <vectors+0x39c> (vectors)
> >>>>> 3824884935 ps ES (ffff800080010b90:cb2063e0) O el1h_ns: sub
> >> x0,
> >>>> sp, x0 (vectors)
> >>>>> R X0 (AARCH64) 00000000 000000c0
> >>>>> 3824884937 ps ES (ffff800080010b94:cb2063ff) O el1h_ns: sub
> sp,
> >>>> sp, x0 (vectors)
> >>>>> R SP_EL1 (AARCH64) ffff8000 808f3c50
> >>>>> 3824884938 ps ES (ffff800080010b98:140001ef) O el1h_ns: b
> >>>> ffff800080011354 <el1h_64_error> (vectors)
> >>>>>
> >>>>> If I understand correctly, the exception happened sometime earlier and
> >> only
> >>>> now Linux boot code (setup_arch) opened the exception handling and as
> a
> >>>> result we immediately jump to the SError exception handler.
> >>>>
> >>>>
> >>>> Yes, that sounds reasonable. If I understood correctly, you are
> >>>> running something "quite new" on some software (QEMU) and
> hardware
> >>>> (Synopsis) simulators.
> >>>>
> >>>> That would mean that you have new hardware with e.g. new memory
> map
> >>>> not used before. What you describe might sound like in the code before
> >>>> Linux (boot loader) there is anything resulting in the SError. This
> >>>> might be an access to non-existing or non-enabled hardware. I.e. it
> >>>> might be that you try to access (read/write) an address what is not
> >>>> available, yet (or just invalid). It's hard to debug that. In case you
> >>>> are able to modify the code before Linux (the boot loader?) you might
> >>>> try to enable SError exceptions, there, too. To get it earlier and
> >>>> with that make the search window smaller. I'm not that familiar with
> >>>> QEMU, but could you try to trace which (all?) hardware accesses your
> >>>> code does. And with that analyse all accesses and with that check if
> >>>> all these accesses are valid even on the hardware (Synopsis) emulation
> >>>> system? That should be checked from valid address and from hardware
> >>>> subsystem enablement point of view.
> >>>>
> >>>> Hth,
> >>>>
> >>>> Dirk
> >>>>
> >>>>
> >>>>> From the Linux source:
> >>>>> parse_early_param();
> >>>>>
> >>>>> dynamic_scs_init();
> >>>>>
> >>>>> /*
> >>>>> * Unmask asynchronous aborts and fiq after bringing up possible
> >>>>> * earlycon. (Report possible System Errors once we can report this
> >>>>> * occurred).
> >>>>> */
> >>>>> local_daif_restore(DAIF_PROCCTX_NOIRQ); <---- This is when we
> get
> >> the
> >>>> exception.
> >>>>>
> >>>>> After some kernel hacking (replacing printk) we could extract the logs:
> >>>>> 6Booting Linux on physical CPU 0x0000000000 [0x410fd034]
> >>>>> 5Linux version 6.5.0 (pliops@dev-liorw) (aarch64-buildroot-linux-gnu-
> >>>> gcc.br_real (Buildroot 2023.02.1-95-g8391404e23) 11.3.0, GNU ld
> (GNU
> >>>> Binutils) 2.38) #101 SMP Sun Dec 17 20:09:06 IST 2023
> >>>>> 6Machine model: Pliops Spider MK-I EVK
> >>>>> 2SError Interrupt on CPU0, code 0x00000000bf000002 -- SError
> >>>>> CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0 #101
> >>>>> Hardware name: Pliops Spider MK-I EVK (DT)
> >>>>> pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >>>>> pc : setup_arch+0x13c/0x5ac
> >>>>> lr : setup_arch+0x134/0x5ac
> >>>>> sp : ffff8000808f3da0
> >>>>> x29: ffff8000808f3da0c x28: 0000000008758074c x27:
> >>>> 0000000005e31b58c
> >>>>> x26: 0000000000000001c x25: 0000000007e5f728c x24:
> >>>> ffff8000808f8000c
> >>>>> x23: ffff8000808f8600c x22: ffff8000807b6000c x21:
> >> ffff800080010000c
> >>>>> x20: ffff800080a1e000c x19: fffffbfffddfe190c x18:
> 000000002266684ac
> >>>>> x17: 00000000fcad60bbc x16: 0000000000001800c x15:
> >>>> 0000000000000008c
> >>>>> x14: ffffffffffffffffc x13: 0000000000000000c x12:
> 0000000000000003c
> >>>>> x11: 0101010101010101c x10: ffffffffffee87dfc x9 :
> >> 0000000000000038c
> >>>>> x8 : 0101010101010101c x7 : 7f7f7f7f7f7f7f7fc x6 :
> >> 0000000000000001c
> >>>>> x5 : 0000000000000000c x4 : 8000000000000000c x3 :
> >>>> 0000000000000065c
> >>>>> x2 : 0000000000000000c x1 : 0000000000000000c x0 :
> >>>> 00000000000000c0c
> >>>>> 0Kernel panic - not syncing: Asynchronous SError Interrupt
> >>>>> CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0 #101
> >>>>> Hardware name: Pliops Spider MK-I EVK (DT)
> >>>>> Call trace:
> >>>>> dump_backtrace+0x9c/0xd0
> >>>>> show_stack+0x14/0x1c
> >>>>> dump_stack_lvl+0x44/0x58
> >>>>> dump_stack+0x14/0x1c
> >>>>> panic+0x2e0/0x33c
> >>>>> nmi_panic+0x68/0x6c
> >>>>> arm64_serror_panic+0x68/0x78
> >>>>> do_serror+0x24/0x54
> >>>>> el1h_64_error_handler+0x2c/0x40
> >>>>> el1h_64_error+0x64/0x68
> >>>>> setup_arch+0x13c/0x5ac
> >>>>> start_kernel+0x5c/0x5b8
> >>>>> __primary_switched+0xb4/0xbc
> >>>>> 0---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
> >>>>>
> >>>>> Can you please advice how to proceed with debugging?
> >>>>>
> >>>>> Thanks in advanced,
> >>>>> Cheers,
> >>>>> Lior.
> >>>>>
> >>>>>
> >>>>
> >>>
> >
next prev parent reply other threads:[~2023-12-21 10:04 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-17 21:32 Debugging early SError exception Lior Weintraub
2023-12-19 7:09 ` Dirk Behme
2023-12-19 13:23 ` Lior Weintraub
2023-12-19 13:37 ` Dirk Behme
2023-12-21 7:43 ` Lior Weintraub
2023-12-21 8:29 ` Dirk Behme
2023-12-21 10:04 ` Lior Weintraub [this message]
2023-12-21 11:19 ` Dirk Behme
2023-12-21 11:36 ` Heiko Schocher
2023-12-21 12:04 ` Lior Weintraub
2023-12-22 7:03 ` Lior Weintraub
2023-12-22 7:48 ` Dirk Behme
2023-12-22 8:04 ` Heiko Schocher
2023-12-24 15:41 ` Lior Weintraub
2023-12-24 19:12 ` Lior Weintraub
2023-12-26 7:48 ` Lior Weintraub
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=PR3P195MB0555AA259A8E616B6C5BA823C395A@PR3P195MB0555.EURP195.PROD.OUTLOOK.COM \
--to=liorw@pliops.com \
--cc=dirk.behme@gmail.com \
--cc=linux-embedded@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).