From: Nilay Shroff <nilay@linux.ibm.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: Keith Busch <kbusch@kernel.org>,
axboe@fb.com, linux-block@vger.kernel.org,
linux-nvme@lists.infradead.org, Gregory Joyce <gjoyce@ibm.com>
Subject: [Bug Report] nvme-cli fails re-formatting NVMe namespace
Date: Fri, 15 Mar 2024 20:01:33 +0530 [thread overview]
Message-ID: <7a3b35dd-7365-4427-95a0-929b28c64e73@linux.ibm.com> (raw)
Hi,
We found that "nvme format ..." command fails to format nvme disk with block-size set to 512.
Notes and observations:
======================
This is observed on the latest linus kernel tree. This was working well on kernel v6.8.
Test details:
=============
At system boot or when nvme is hot plugin, the nvme block size is 4096 and later if we try format
it with the block-size of 512 (lbaf=2) then it fails. Interestingly, if we start with the nvme block
size of 512 and later if we try format it with block-size of 4096 (lbaf=0) then it doesn't fail.
Please note that CONFIG_NVME_MULTIPATH is enabled.
Please find below further details:
# lspci
0018:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM173X
# nvme list
Node Generic SN Model Namespace Usage Format FW Rev
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
/dev/nvme0n1 /dev/ng0n1 S6EUNA0R500358 1.6TB NVMe Gen4 U.2 SSD 0x1 1.60 TB / 1.60 TB 512 B + 0 B REV.SN49
# nvme id-ns /dev/nvme0n1 -H
NVME Identify Namespace 1:
nsze : 0xba4d4ab0
ncap : 0xba4d4ab0
nuse : 0xba4d4ab0
<snip>
<snip>
nlbaf : 4
flbas : 0
[6:5] : 0 Most significant 2 bits of Current LBA Format Selected
[4:4] : 0 Metadata Transferred in Separate Contiguous Buffer
[3:0] : 0 Least significant 4 bits of Current LBA Format Selected
<snip>
<snip>
LBA Format 0 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best (in use)
LBA Format 1 : Metadata Size: 8 bytes - Data Size: 4096 bytes - Relative Performance: 0x2 Good
LBA Format 2 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x1 Better
LBA Format 3 : Metadata Size: 8 bytes - Data Size: 512 bytes - Relative Performance: 0x3 Degraded
LBA Format 4 : Metadata Size: 64 bytes - Data Size: 4096 bytes - Relative Performance: 0x3 Degraded
# lsblk -t /dev/nvme0n1
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
nvme0n1 0 4096 0 4096 4096 0 128 0B
^^^ ^^^
!!!! FAILING TO FORMAT with 512 bytes of block size !!!!
# nvme format /dev/nvme0n1 --lbaf=2 --pil=0 --ms=0 --pi=0 -f
Success formatting namespace:1
failed to set block size to 512
^^^
# lsblk -t /dev/nvme0n1
NAME ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE RA WSAME
nvme0n1 0 4096 0 4096 4096 0 128 0B
^^^ ^^^
# cat /sys/block/nvme0n1/queue/logical_block_size:4096
# cat /sys/block/nvme0n1/queue/physical_block_size:4096
# cat /sys/block/nvme0c0n1/queue/logical_block_size:512
# cat /sys/block/nvme0c0n1/queue/physical_block_size:512
# nvme id-ns /dev/nvme0n1 -H
NVME Identify Namespace 1:
nsze : 0xba4d4ab0
ncap : 0xba4d4ab0
nuse : 0xba4d4ab0
<snip>
<snip>
nlbaf : 4
flbas : 0x2
[6:5] : 0 Most significant 2 bits of Current LBA Format Selected
[4:4] : 0 Metadata Transferred in Separate Contiguous Buffer
[3:0] : 0x2 Least significant 4 bits of Current LBA Format Selected
<snip>
<snip>
LBA Format 0 : Metadata Size: 0 bytes - Data Size: 4096 bytes - Relative Performance: 0 Best
LBA Format 1 : Metadata Size: 8 bytes - Data Size: 4096 bytes - Relative Performance: 0x2 Good
LBA Format 2 : Metadata Size: 0 bytes - Data Size: 512 bytes - Relative Performance: 0x1 Better (in use)
LBA Format 3 : Metadata Size: 8 bytes - Data Size: 512 bytes - Relative Performance: 0x3 Degraded
LBA Format 4 : Metadata Size: 64 bytes - Data Size: 4096 bytes - Relative Performance: 0x3 Degraded
Note : We could see above that the NVMe is indeed formatted with lbaf 2(block size 512). However,
the block queue limits are not correctly updated.
Git bisect:
==========
Git bisect reveals the following commit as bad commit:
8f03cfa117e06bd2d3ba7ed8bba70a3dda310cae is the first bad commit
commit 8f03cfa117e06bd2d3ba7ed8bba70a3dda310cae
Author: Christoph Hellwig <hch@lst.de>
Date: Mon Mar 4 07:04:51 2024 -0700
nvme: don't use nvme_update_disk_info for the multipath disk
Currently nvme_update_ns_info_block calls nvme_update_disk_info both for
the namespace attached disk, and the multipath one (if it exists). This
is very different from how other stacking drivers work, and leads to
a lot of complexity.
Switch to setting the disk capacity and initializing the integrity
profile, and let blk_stack_limits which already is called just below
deal with updating the other limits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
drivers/nvme/host/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
The above commit is part of the new atomic queue limit updates patch series. For
NVMe device if multipath config is enabled then we rely on blk_stack_limits to
update the queue limits for the stacked device. For updating the logical/physical
queue limit of the top (nvme%dn%d) device, the blk_stack_limits() uses the max of
top and bottom limit:
t->logical_block_size = max(t->logical_block_size,
b->logical_block_size);
t->physical_block_size = max(t->physical_block_size,
b->physical_block_size);
When we try formatting the nvme disk with block-size of 512, the value of
t->logical_block_size would be 4096 (as this is the initial block-size) however the
value of b->logical_block_size would be 512 (the block size of the bottom device is first
updated in nvme_update_ns_info_block()).
I think we may want to update the queue limits of both top and bottom devices in the
nvme_update_ns_info_block(). Or if there's some other way?
Let me know if you need any further information.
Thanks,
--Nilay
next reply other threads:[~2024-03-15 14:36 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-15 14:31 Nilay Shroff [this message]
2024-03-18 2:18 ` [Bug Report] nvme-cli fails re-formatting NVMe namespace Christoph Hellwig
2024-03-18 4:56 ` Nilay Shroff
2024-03-18 23:18 ` Christoph Hellwig
2024-03-20 2:19 ` Christoph Hellwig
2024-03-20 5:53 ` Nilay Shroff
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7a3b35dd-7365-4427-95a0-929b28c64e73@linux.ibm.com \
--to=nilay@linux.ibm.com \
--cc=axboe@fb.com \
--cc=gjoyce@ibm.com \
--cc=hch@infradead.org \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.