linux-embedded.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dirk Behme <dirk.behme@gmail.com>
To: Lior Weintraub <liorw@pliops.com>,
	"linux-embedded@vger.kernel.org" <linux-embedded@vger.kernel.org>
Subject: Re: Debugging early SError exception
Date: Thu, 21 Dec 2023 12:19:15 +0100	[thread overview]
Message-ID: <8140c4c7-10d5-46dc-8c32-8bee7bf95918@gmail.com> (raw)
In-Reply-To: <PR3P195MB0555AA259A8E616B6C5BA823C395A@PR3P195MB0555.EURP195.PROD.OUTLOOK.COM>

Am 21.12.23 um 11:04 schrieb Lior Weintraub:
> Thanks Dirk,
> 
> Regarding the earlyprintk, not sure I know how to make it work.
> I have defined CONFIG_EARLY_PRINTK=y and CONFIG_DEBUG_LL=y on my config but it doesn't seem to work.
> Do I need to pass something in the bootargs from the U-BOOT?
> Do I need to add that into my device tree?
> (Tried to set bootargs = "console=ttyS0,115200 earlyprintk"; under "chosen" on my DT but it didn't work)

Yes, what has to be enabled and what not and what has to be set how is 
often confusing. I think this is not common for all systems, so I 
think to be on the safe side you have to look into the code for you 
system. Or short; The code is the documentation ;)


> The UART I am using is "snps,dw-apb-uart".
> 
> Last week, to output the early logs I have implemented this hack:
> 1. Modify printk macro to run my print_func
> 2. This print_func wrote the characters into a single global variable (u32 simul_uart;)
> 3. Get the address location of this global variable and extract all writes to it from the Tarmac logs.
> 
> This is a very slow and tedious process but it helped me identify the initial SError.
> Initially I thought I can write directly into the UART FIFO register (which I know the address) but this didn't work because Linux already setup the MMU so I guess I need to know the virtual address of this FIFO.
> Do I need to use __phys_to_virt of some sort?

Yes, I think so. Have a look to the existing serial driver, too. It 
should do whats needed, and you can borrow that, then.

Best regards

Dirk


> Cheers,
> Lior.
> 
>> -----Original Message-----
>> From: Dirk Behme <dirk.behme@gmail.com>
>> Sent: Thursday, December 21, 2023 10:30 AM
>> To: Lior Weintraub <liorw@pliops.com>; linux-embedded@vger.kernel.org
>> Subject: Re: Debugging early SError exception
>>
>> [You don't often get email from dirk.behme@gmail.com. Learn why this is
>> important at https://aka.ms/LearnAboutSenderIdentification ]
>>
>> CAUTION: External Sender
>>
>> Am 21.12.23 um 08:43 schrieb Lior Weintraub:
>>> Hi Dirk,
>>>
>>> We found that the issue was at the early stages of Barebox (a.k.a U-BOOT
>> v2).
>>
>> Glad to hear that! :)
>>
>>> Our implementation of putc_ll (on debug_ll) was writing into the UART Tx
>> FIFO without checking if the FIFO is full.
>>> Once the fifo got full it caused this SError probably because the UART IP
>> generated an apberror signal.
>>
>> Thanks for the report!
>>
>>> Now the Linux is running and doesn't report the SError again but now we
>> face another issue.
>>> We see that the PC is getting into a "report_bug" function.
>>> The Linux doesn't print anything to the UART (probably since it hasn't got to
>> the point where the console is configured?).
>>
>> For cases like this using earlyprintk is usually a good option. Check
>> the Linux kernel serial console (UART) dirver of you SoC if it
>> supports it. In the end it should be "just" a function in the serial
>> console driver which outputs the console data via polling before
>> (later) the interrupt driven console part takes over.
>>
>> Best regards
>>
>> Dirk
>>
>>
>>> Since our debug means are limited it can take some time to find the root
>> cause.
>>>
>>> I will keep you posted and update our findings.
>>> Love to hear your thoughts,
>>>
>>> Cheers,
>>> Lior.
>>>
>>>
>>>> -----Original Message-----
>>>> From: Dirk Behme <dirk.behme@gmail.com>
>>>> Sent: Tuesday, December 19, 2023 3:37 PM
>>>> To: Lior Weintraub <liorw@pliops.com>; linux-embedded@vger.kernel.org
>>>> Subject: Re: Debugging early SError exception
>>>>
>>>> [You don't often get email from dirk.behme@gmail.com. Learn why this is
>>>> important at https://aka.ms/LearnAboutSenderIdentification ]
>>>>
>>>> CAUTION: External Sender
>>>>
>>>> Am 19.12.23 um 14:23 schrieb Lior Weintraub:
>>>>> Thanks Dirk,
>>>>
>>>> Welcome :)
>>>>
>>>> In case you find the root cause it would be nice to get some generic
>>>> description of it so that we can learn something :)
>>>>
>>>> Best regards
>>>>
>>>> Dirk
>>>>
>>>>
>>>>>> -----Original Message-----
>>>>>> From: Dirk Behme <dirk.behme@gmail.com>
>>>>>> Sent: Tuesday, December 19, 2023 9:09 AM
>>>>>> To: Lior Weintraub <liorw@pliops.com>; linux-
>> embedded@vger.kernel.org
>>>>>> Subject: Re: Debugging early SError exception
>>>>>>
>>>>>> [You don't often get email from dirk.behme@gmail.com. Learn why this
>> is
>>>>>> important at https://aka.ms/LearnAboutSenderIdentification ]
>>>>>>
>>>>>> CAUTION: External Sender
>>>>>>
>>>>>> Am 17.12.23 um 22:32 schrieb Lior Weintraub:
>>>>>>> Hi,
>>>>>>>
>>>>>>> We have a new SoC with eLinux porting (kernel v6.5).
>>>>>>> This SoC is ARM64 (A53) single core based device.
>>>>>>> It runs correctly on QEMU but fails with SError on emulation platform
>>>>>> (Synopsys Zebu running our SoC model).
>>>>>>> There is no debugger connected to this emulation but there are several
>>>>>> debug capabilities we can use:
>>>>>>> 1. Generating wave dump of CPU signals
>>>>>>> 2. Generate a Tarmac log
>>>>>>> 3. UART
>>>>>>>
>>>>>>> Since the SError happens at early stages of Linux boot the UART is not
>>>>>> enabled yet.
>>>>>>>     From the Tarmac log we can see:
>>>>>>>      3824884521 ps  ES  (ffff800080760888:d65f03c0) O el1h_ns:   ret
>>>>>> (parse_early_param)
>>>>>>>      3824884522 ps  ES  (ffff800080763a60:d2801800) O el1h_ns:   mov
>>>> x0,
>>>>>> #0xc0   //      #192    (setup_arch)
>>>>>>>                         R X0 (AARCH64) 00000000 000000c0
>>>>>>>      3824884523 ps  ES  (ffff800080763a64:d51b4220) O el1h_ns:   msr
>>>>>> daif,   x0      (setup_arch)
>>>>>>>                         R CPSR 600000c5
>>>>>>>      3824884529 ps  ES  System Error (Abort)
>>>>>>>                         EXC [0x380] SError/vSError Current EL with SP_ELx
>>>>>>>                         R ESR_EL1 (AARCH64) bf000002
>>>>>>>                         R CPSR 600003c5
>>>>>>>                         R SPSR_EL1 (AARCH64) 600000c5
>>>>>>>                         R ELR_EL1 (AARCH64) ffff8000 80763a68
>>>>>>>      3824884925 ps  ES  (ffff800080010b80:d10543ff) O el1h_ns:   sub
>>>> sp,
>>>>>> sp,     #0x150  (vectors)
>>>>>>>                         R SP_EL1 (AARCH64) ffff8000 808f3c50
>>>>>>>      3824884925 ps  ES  (ffff800080010b84:8b2063ff) O el1h_ns:   add
>>>> sp,
>>>>>> sp,     x0      (vectors)
>>>>>>>                         R SP_EL1 (AARCH64) ffff8000 808f3d10
>>>>>>>      3824884926 ps  ES  (ffff800080010b88:cb2063e0) O el1h_ns:   sub
>>>> x0,
>>>>>> sp,     x0      (vectors)
>>>>>>>                         R X0 (AARCH64) ffff8000 808f3c50
>>>>>>>      3824884927 ps  ES  (ffff800080010b8c:37700080) O el1h_ns:   tbnz
>>>> w0,
>>>>>> #14,    ffff800080010b9c        <vectors+0x39c>         (vectors)
>>>>>>>      3824884935 ps  ES  (ffff800080010b90:cb2063e0) O el1h_ns:   sub
>>>> x0,
>>>>>> sp,     x0      (vectors)
>>>>>>>                         R X0 (AARCH64) 00000000 000000c0
>>>>>>>      3824884937 ps  ES  (ffff800080010b94:cb2063ff) O el1h_ns:   sub
>> sp,
>>>>>> sp,     x0      (vectors)
>>>>>>>                         R SP_EL1 (AARCH64) ffff8000 808f3c50
>>>>>>>      3824884938 ps  ES  (ffff800080010b98:140001ef) O el1h_ns:   b
>>>>>> ffff800080011354        <el1h_64_error>         (vectors)
>>>>>>>
>>>>>>> If I understand correctly, the exception happened sometime earlier and
>>>> only
>>>>>> now Linux boot code (setup_arch) opened the exception handling and as
>> a
>>>>>> result we immediately jump to the SError exception handler.
>>>>>>
>>>>>>
>>>>>> Yes, that sounds reasonable. If I understood correctly, you are
>>>>>> running something "quite new" on some software (QEMU) and
>> hardware
>>>>>> (Synopsis) simulators.
>>>>>>
>>>>>> That would mean that you have new hardware with e.g. new memory
>> map
>>>>>> not used before. What you describe might sound like in the code before
>>>>>> Linux (boot loader) there is anything resulting in the SError. This
>>>>>> might be an access to non-existing or non-enabled hardware. I.e. it
>>>>>> might be that you try to access (read/write) an address what is not
>>>>>> available, yet (or just invalid). It's hard to debug that. In case you
>>>>>> are able to modify the code before Linux (the boot loader?) you might
>>>>>> try to enable SError exceptions, there, too. To get it earlier and
>>>>>> with that make the search window smaller. I'm not that familiar with
>>>>>> QEMU, but could you try to trace which (all?) hardware accesses your
>>>>>> code does. And with that analyse all accesses and with that check if
>>>>>> all these accesses are valid even on the hardware (Synopsis) emulation
>>>>>> system? That should be checked from valid address and from hardware
>>>>>> subsystem enablement point of view.
>>>>>>
>>>>>> Hth,
>>>>>>
>>>>>> Dirk
>>>>>>
>>>>>>
>>>>>>>     From the Linux source:
>>>>>>>          parse_early_param();
>>>>>>>
>>>>>>>          dynamic_scs_init();
>>>>>>>
>>>>>>>          /*
>>>>>>>           * Unmask asynchronous aborts and fiq after bringing up possible
>>>>>>>           * earlycon. (Report possible System Errors once we can report this
>>>>>>>           * occurred).
>>>>>>>           */
>>>>>>>          local_daif_restore(DAIF_PROCCTX_NOIRQ); <---- This is when we
>> get
>>>> the
>>>>>> exception.
>>>>>>>
>>>>>>> After some kernel hacking (replacing printk) we could extract the logs:
>>>>>>> 6Booting Linux on physical CPU 0x0000000000 [0x410fd034]
>>>>>>> 5Linux version 6.5.0 (pliops@dev-liorw) (aarch64-buildroot-linux-gnu-
>>>>>> gcc.br_real (Buildroot 2023.02.1-95-g8391404e23) 11.3.0, GNU ld
>> (GNU
>>>>>> Binutils) 2.38) #101 SMP Sun Dec 17 20:09:06 IST 2023
>>>>>>> 6Machine model: Pliops Spider MK-I EVK
>>>>>>> 2SError Interrupt on CPU0, code 0x00000000bf000002 -- SError
>>>>>>> CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0 #101
>>>>>>> Hardware name: Pliops Spider MK-I EVK (DT)
>>>>>>> pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>>>>>>> pc : setup_arch+0x13c/0x5ac
>>>>>>> lr : setup_arch+0x134/0x5ac
>>>>>>> sp : ffff8000808f3da0
>>>>>>> x29: ffff8000808f3da0c x28: 0000000008758074c x27:
>>>>>> 0000000005e31b58c
>>>>>>> x26: 0000000000000001c x25: 0000000007e5f728c x24:
>>>>>> ffff8000808f8000c
>>>>>>> x23: ffff8000808f8600c x22: ffff8000807b6000c x21:
>>>> ffff800080010000c
>>>>>>> x20: ffff800080a1e000c x19: fffffbfffddfe190c x18:
>> 000000002266684ac
>>>>>>> x17: 00000000fcad60bbc x16: 0000000000001800c x15:
>>>>>> 0000000000000008c
>>>>>>> x14: ffffffffffffffffc x13: 0000000000000000c x12:
>> 0000000000000003c
>>>>>>> x11: 0101010101010101c x10: ffffffffffee87dfc x9 :
>>>> 0000000000000038c
>>>>>>> x8 : 0101010101010101c x7 : 7f7f7f7f7f7f7f7fc x6 :
>>>> 0000000000000001c
>>>>>>> x5 : 0000000000000000c x4 : 8000000000000000c x3 :
>>>>>> 0000000000000065c
>>>>>>> x2 : 0000000000000000c x1 : 0000000000000000c x0 :
>>>>>> 00000000000000c0c
>>>>>>> 0Kernel panic - not syncing: Asynchronous SError Interrupt
>>>>>>> CPU: 0 PID: 0 Comm: swapper Not tainted 6.5.0 #101
>>>>>>> Hardware name: Pliops Spider MK-I EVK (DT)
>>>>>>> Call trace:
>>>>>>>      dump_backtrace+0x9c/0xd0
>>>>>>>      show_stack+0x14/0x1c
>>>>>>>      dump_stack_lvl+0x44/0x58
>>>>>>>      dump_stack+0x14/0x1c
>>>>>>>      panic+0x2e0/0x33c
>>>>>>>      nmi_panic+0x68/0x6c
>>>>>>>      arm64_serror_panic+0x68/0x78
>>>>>>>      do_serror+0x24/0x54
>>>>>>>      el1h_64_error_handler+0x2c/0x40
>>>>>>>      el1h_64_error+0x64/0x68
>>>>>>>      setup_arch+0x13c/0x5ac
>>>>>>>      start_kernel+0x5c/0x5b8
>>>>>>>      __primary_switched+0xb4/0xbc
>>>>>>> 0---[ end Kernel panic - not syncing: Asynchronous SError Interrupt ]---
>>>>>>>
>>>>>>> Can you please advice how to proceed with debugging?
>>>>>>>
>>>>>>> Thanks in advanced,
>>>>>>> Cheers,
>>>>>>> Lior.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
> 


  reply	other threads:[~2023-12-21 11:19 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-17 21:32 Debugging early SError exception Lior Weintraub
2023-12-19  7:09 ` Dirk Behme
2023-12-19 13:23   ` Lior Weintraub
2023-12-19 13:37     ` Dirk Behme
2023-12-21  7:43       ` Lior Weintraub
2023-12-21  8:29         ` Dirk Behme
2023-12-21 10:04           ` Lior Weintraub
2023-12-21 11:19             ` Dirk Behme [this message]
2023-12-21 11:36               ` Heiko Schocher
2023-12-21 12:04                 ` Lior Weintraub
2023-12-22  7:03                 ` Lior Weintraub
2023-12-22  7:48                   ` Dirk Behme
2023-12-22  8:04                     ` Heiko Schocher
2023-12-24 15:41                       ` Lior Weintraub
2023-12-24 19:12                         ` Lior Weintraub
2023-12-26  7:48                           ` Lior Weintraub

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8140c4c7-10d5-46dc-8c32-8bee7bf95918@gmail.com \
    --to=dirk.behme@gmail.com \
    --cc=linux-embedded@vger.kernel.org \
    --cc=liorw@pliops.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).