All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
From: Miquel Raynal <miquel.raynal@bootlin.com>
To: Alexander Dahl <ada@thorsis.com>
Cc: linux-mtd@lists.infradead.org,
	Richard Weinberger <richard@nod.at>,
	Vignesh Raghavendra <vigneshr@ti.com>,
	linux-kernel@vger.kernel.org
Subject: Re: mtd: nand: raw: Possible bug in nand_onfi_detect()?
Date: Wed, 6 Mar 2024 16:48:31 +0100	[thread overview]
Message-ID: <20240306164831.29eed907@xps-13> (raw)
In-Reply-To: <20240306-shaky-bunion-d28b65ea97d7@thorsis.com>

Hi Alexander,

ada@thorsis.com wrote on Wed, 6 Mar 2024 15:36:04 +0100:

> Hello everyone,
> 
> I think I found a bug in nand_onfi_detect() which was introduced with
> commit c27842e7e11f ("mtd: rawnand: onfi: Adapt the parameter page
> read to constraint controllers") back in 2020.

Interesting. I don't think this patch did broke anything, as
constrained controllers would just not support the read_data_op() call
anyway.

That being said, I don't see why the atmel controller would
refuse this operation, as it is supposed to support all
operations without limitation. This is one of the three issues
you have, that probably needs fixing.

> Background on how I found this: I'm currently struggling getting raw
> nand flash access to fly with an at91 sam9x60 SoC and a S34ML02G1
> Spansion SLC raw NAND flash on a custom board.  The setup is
> comparable to the sam9x60 curiosity board and can be reproduced with
> that one.
> 
> NAND flash on sam9x60 curiosity board works fine with what is in
> mainline Linux kernel.  However after removing the line 'rb-gpios =
> <&pioD 5 GPIO_ACTIVE_HIGH>;' from at91-sam9x60_curiosity.dts all data
> read from the flash appears to be zeros only.  (I did not add that
> line to the dts of my custom board first, this is how I stumbled over
> this.)
> 
> I have no explanation for that behaviour, it should work without R/B#
> by reading the status register, maybe we investigate that
> in depth later.

I don't see why at a first look. The default is "no RB" if no property
is given in the DT so it should work. Tracing the wait ready function
calls might help.

>  However those all zeros data reads happens when
> reading the ONFI param page as well es data read from OOB/spare area
> later and I bet it's the same with usual data.

Reading data without observing tWB + tR may lead to this.

> This read error reveals a bug in nand_onfi_detect().  After setting
> up some things there's this for loop:
> 
>     for (i = 0; i < ONFI_PARAM_PAGES; i++) {
> 
> For i = 0 nand_read_param_page_op() is called and in my case all zeros
> are returned and thus the CRC calculated does not match the all zeros
> CRC read.  So the usual break on successful reading the first page is
> skipped and for reading the second page nand_change_read_column_op()
> is called.  I think that one always fails on this line:
> 
>     if (offset_in_page + len > mtd->writesize + mtd->oobsize) {
> 
> Those variables contain the following values:
> 
>     offset_in_page: 256
>     len: 256
>     mtd->writesize: 0
>     mtd->oobsize: 0

Indeed. We probably need some kind of extra check that does not perform
the if clause above if !mtd->writesize.

> The condition is true and nand_change_read_column_op() returns with
> -EINVAL, because mtd->writesize and mtd->oobsize are not set yet in
> that code path.  Those are probably initialized later, maybe with
> parameters read from that ONFI param page?
> 
> Returning with error from nand_change_read_column_op() leads to
> jumping out of nand_onfi_detect() early, and no ONFI param page is
> evaluated at all, although the second or third page could be intact.
> 
> I guess this would also fail with any other reason for not matching
> CRCs in the first page, but I have not faulty NAND flash chip to
> confirm that.

Thanks for the whole report, it is interesting and should lead to fixes:
- why does the controller refuses the datain op?
- why nand_soft_waitrdy is not enough?
- changing the condition in nand_change_read_column_op()

Can you take care of these?

Thanks,
Miquèl

WARNING: multiple messages have this Message-ID (diff)
From: Miquel Raynal <miquel.raynal@bootlin.com>
To: Alexander Dahl <ada@thorsis.com>
Cc: linux-mtd@lists.infradead.org,
	Richard Weinberger <richard@nod.at>,
	Vignesh Raghavendra <vigneshr@ti.com>,
	linux-kernel@vger.kernel.org
Subject: Re: mtd: nand: raw: Possible bug in nand_onfi_detect()?
Date: Wed, 6 Mar 2024 16:48:31 +0100	[thread overview]
Message-ID: <20240306164831.29eed907@xps-13> (raw)
In-Reply-To: <20240306-shaky-bunion-d28b65ea97d7@thorsis.com>

Hi Alexander,

ada@thorsis.com wrote on Wed, 6 Mar 2024 15:36:04 +0100:

> Hello everyone,
> 
> I think I found a bug in nand_onfi_detect() which was introduced with
> commit c27842e7e11f ("mtd: rawnand: onfi: Adapt the parameter page
> read to constraint controllers") back in 2020.

Interesting. I don't think this patch did broke anything, as
constrained controllers would just not support the read_data_op() call
anyway.

That being said, I don't see why the atmel controller would
refuse this operation, as it is supposed to support all
operations without limitation. This is one of the three issues
you have, that probably needs fixing.

> Background on how I found this: I'm currently struggling getting raw
> nand flash access to fly with an at91 sam9x60 SoC and a S34ML02G1
> Spansion SLC raw NAND flash on a custom board.  The setup is
> comparable to the sam9x60 curiosity board and can be reproduced with
> that one.
> 
> NAND flash on sam9x60 curiosity board works fine with what is in
> mainline Linux kernel.  However after removing the line 'rb-gpios =
> <&pioD 5 GPIO_ACTIVE_HIGH>;' from at91-sam9x60_curiosity.dts all data
> read from the flash appears to be zeros only.  (I did not add that
> line to the dts of my custom board first, this is how I stumbled over
> this.)
> 
> I have no explanation for that behaviour, it should work without R/B#
> by reading the status register, maybe we investigate that
> in depth later.

I don't see why at a first look. The default is "no RB" if no property
is given in the DT so it should work. Tracing the wait ready function
calls might help.

>  However those all zeros data reads happens when
> reading the ONFI param page as well es data read from OOB/spare area
> later and I bet it's the same with usual data.

Reading data without observing tWB + tR may lead to this.

> This read error reveals a bug in nand_onfi_detect().  After setting
> up some things there's this for loop:
> 
>     for (i = 0; i < ONFI_PARAM_PAGES; i++) {
> 
> For i = 0 nand_read_param_page_op() is called and in my case all zeros
> are returned and thus the CRC calculated does not match the all zeros
> CRC read.  So the usual break on successful reading the first page is
> skipped and for reading the second page nand_change_read_column_op()
> is called.  I think that one always fails on this line:
> 
>     if (offset_in_page + len > mtd->writesize + mtd->oobsize) {
> 
> Those variables contain the following values:
> 
>     offset_in_page: 256
>     len: 256
>     mtd->writesize: 0
>     mtd->oobsize: 0

Indeed. We probably need some kind of extra check that does not perform
the if clause above if !mtd->writesize.

> The condition is true and nand_change_read_column_op() returns with
> -EINVAL, because mtd->writesize and mtd->oobsize are not set yet in
> that code path.  Those are probably initialized later, maybe with
> parameters read from that ONFI param page?
> 
> Returning with error from nand_change_read_column_op() leads to
> jumping out of nand_onfi_detect() early, and no ONFI param page is
> evaluated at all, although the second or third page could be intact.
> 
> I guess this would also fail with any other reason for not matching
> CRCs in the first page, but I have not faulty NAND flash chip to
> confirm that.

Thanks for the whole report, it is interesting and should lead to fixes:
- why does the controller refuses the datain op?
- why nand_soft_waitrdy is not enough?
- changing the condition in nand_change_read_column_op()

Can you take care of these?

Thanks,
Miquèl

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

  reply	other threads:[~2024-03-06 15:48 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-06 14:36 mtd: nand: raw: Possible bug in nand_onfi_detect()? Alexander Dahl
2024-03-06 14:36 ` Alexander Dahl
2024-03-06 15:48 ` Miquel Raynal [this message]
2024-03-06 15:48   ` Miquel Raynal
2024-03-07 16:02   ` Alexander Dahl
2024-03-07 16:02     ` Alexander Dahl
2024-03-07 17:19     ` Miquel Raynal
2024-03-07 17:19       ` Miquel Raynal
2024-03-25  9:09       ` Miquel Raynal
2024-03-25  9:09         ` Miquel Raynal
2024-03-25  9:59         ` Alexander Dahl
2024-03-25  9:59           ` Alexander Dahl
2024-05-07 16:08 ` Miquel Raynal
2024-05-07 16:08   ` Miquel Raynal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240306164831.29eed907@xps-13 \
    --to=miquel.raynal@bootlin.com \
    --cc=ada@thorsis.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mtd@lists.infradead.org \
    --cc=richard@nod.at \
    --cc=vigneshr@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.