From: Brian Foster <bfoster@redhat.com>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Christoph Hellwig <hch@lst.de>,
Tony Asleson <tasleson@redhat.com>,
linux-bcachefs@vger.kernel.org, linux-block@vger.kernel.org,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
"Darrick J. Wong" <djwong@kernel.org>, Tejun Heo <tj@kernel.org>
Subject: Re: kernel oops on bcachefs umount, 6.7 kernel
Date: Thu, 22 Feb 2024 08:23:09 -0500 [thread overview]
Message-ID: <ZddKvWVjyOg1ENz+@bfoster> (raw)
In-Reply-To: <w4uqoqykzazwyqcmlz4bamxpqloxfbitipgigzxmccqczlixp2@yr4bizgz5kqk>
On Wed, Feb 21, 2024 at 07:07:37PM -0500, Kent Overstreet wrote:
> On Wed, Feb 21, 2024 at 07:39:49AM -0500, Brian Foster wrote:
> > On Fri, Feb 16, 2024 at 10:57:24AM -0500, Kent Overstreet wrote:
> > > On Fri, Feb 16, 2024 at 07:40:34AM -0500, Brian Foster wrote:
> > > > On Fri, Feb 16, 2024 at 09:00:17AM +0100, Christoph Hellwig wrote:
> > > > > On Thu, Feb 15, 2024 at 07:24:23PM -0500, Kent Overstreet wrote:
> > > > > > > It looks like the warning could be avoided in bcachefs by checking for
> > > > > > > whether the parent dir/node still exists at cleanup time, but I'm not
> > > > > > > familiar enough with kobj management to say whether that's the
> > > > > > > right/best solution. It also looks a little odd to me to see a
> > > > > > > /sys/block/<dev>/bcachefs dir when I've not seen any other fs or driver
> > > > > > > do such a thing in the block sysfs dir(s).
> > > > > > >
> > > > > > > Any thoughts on this from the block subsystem folks? Is it reasonable to
> > > > > > > leave this link around and just fix the removal check, or is another
> > > > > > > behavior preferred? Thanks.
> > > > >
> > > > > This is the general problem with random cross-subsystem sysfs reference,
> > > > > and why they are best avoided. The block layer tears down all the sysfs
> > > > > objects at del_gendisk time as no one should start using the sysfs files
> > > > > at that point, but a mounted file system or other opener will of course
> > > > > keep the bdev itself alive.
> > > > >
> > > >
> > > > Yeah, makes sense. The fact that the dir goes away despite having the
> > > > bdev open is partly what made this seem a little odd to me.
> > > >
> > > > > I'm not sure why bcachefs is doing this, but no one really should be
> > > > > using the block layer sysfs structures and pointers except for the block
> > > > > layer itself.
> > > > >
> > > >
> > > > From Kent's comments it sounds like it was just some loose carryover
> > > > from old bcache stuff. I had poked around a bit for anything similar and
> > > > it looked to me that current bcache doesn't do this either, but I could
> > > > have missed something.
> > > >
> > > > > > so there's an existing bd_holder mechanism that e.g. device mapper uses
> > > > > > for links between block devices. I think the "this block device is going
> > > > > > away" code knows how to clean those up.
> > > > > >
> > > > > > We're not using that mechanism - and perhaps we should have been, I'd
> > > > > > need a time machine to ask myself why I did it that way 15 years back.
> > > > >
> > > > > Well, at least Tejun had a very strong opinion that no one should be
> > > > > abusing sysfs symlinks for linking up subsystems at all, see commit
> > > > > 49731baa41df404c2c3f44555869ab387363af43, which is also why this code
> > > > > is marked deprecated and we've not added additional users.
> > > > >
> > > >
> > > > Thanks. I'll send a patch to remove this once I'm back from vacation.
> > >
> > > No, we can't remove it - userspace needs to know this topology. When
> > > we've got one sysfs node with a direct relationship to another sysfs
> > > node, that needs to be reflected in syfs.
> > >
> >
> > Do you mean that some related userspace tool relies on this to function,
> > or generally disagreeing with the statement(s) above around links from
> > /sys/block/<dev>/?
>
> I would have to do some digging to give a definitive answer to that
> question.
>
> bcache definitely needed those links (IIRC in udev rules?) - bcachefs
> may not, but we haven't even started on integrating bcachefs with all
> the userspace disk management tooling out there.
>
I don't know bcache that well, but I didn't see anything obviously
putting links in the bdev dir when I last looked. It did look like it
created some links between its own sysfs dirs, but maybe I
misinterpreted the code.
I don't think that any other fs with its own /sys/fs/ presence (i.e.,
XFS, ext4, btrfs) does this sort of thing, so I'm a little skeptical of
the idea that userspace currently needs it (not necessarily that it
couldn't use it for the better in the future).
> And this is stuff that userspace does in general need; if we're getting
> by without it for other filesystems right now it's probably by doing
> something horrid like parsing /proc/mounts; if we could get filesystems
> using the same bd_holder stuff that other block layer stuff uses, that
> would be _amazing_
>
Ok, but that sounds contradictory to what the block layer folks want.
I dunno, it seems to me that the notions of better general coordination
between /sys/block and /sys/fs and that of bcachefs' current behavior
being wrong are not mutually exclusive things. But I'll just leave it
alone until there's some more clarity here..
Brian
prev parent reply other threads:[~2024-02-22 13:21 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-01 21:52 kernel oops on bcachefs umount, 6.7 kernel Tony Asleson
2024-02-15 16:55 ` Brian Foster
2024-02-16 0:24 ` Kent Overstreet
2024-02-16 8:00 ` Christoph Hellwig
2024-02-16 12:40 ` Brian Foster
2024-02-16 15:57 ` Kent Overstreet
2024-02-21 12:39 ` Brian Foster
2024-02-22 0:07 ` Kent Overstreet
2024-02-22 13:23 ` Brian Foster [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZddKvWVjyOg1ENz+@bfoster \
--to=bfoster@redhat.com \
--cc=djwong@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=hch@lst.de \
--cc=kent.overstreet@linux.dev \
--cc=linux-bcachefs@vger.kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=tasleson@redhat.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).