Linux-bcachefs Archive mirror
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Christoph Hellwig <hch@lst.de>,
	Tony Asleson <tasleson@redhat.com>,
	linux-bcachefs@vger.kernel.org, linux-block@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Darrick J. Wong" <djwong@kernel.org>, Tejun Heo <tj@kernel.org>
Subject: Re: kernel oops on bcachefs umount, 6.7 kernel
Date: Thu, 22 Feb 2024 08:23:09 -0500	[thread overview]
Message-ID: <ZddKvWVjyOg1ENz+@bfoster> (raw)
In-Reply-To: <w4uqoqykzazwyqcmlz4bamxpqloxfbitipgigzxmccqczlixp2@yr4bizgz5kqk>

On Wed, Feb 21, 2024 at 07:07:37PM -0500, Kent Overstreet wrote:
> On Wed, Feb 21, 2024 at 07:39:49AM -0500, Brian Foster wrote:
> > On Fri, Feb 16, 2024 at 10:57:24AM -0500, Kent Overstreet wrote:
> > > On Fri, Feb 16, 2024 at 07:40:34AM -0500, Brian Foster wrote:
> > > > On Fri, Feb 16, 2024 at 09:00:17AM +0100, Christoph Hellwig wrote:
> > > > > On Thu, Feb 15, 2024 at 07:24:23PM -0500, Kent Overstreet wrote:
> > > > > > > It looks like the warning could be avoided in bcachefs by checking for
> > > > > > > whether the parent dir/node still exists at cleanup time, but I'm not
> > > > > > > familiar enough with kobj management to say whether that's the
> > > > > > > right/best solution. It also looks a little odd to me to see a
> > > > > > > /sys/block/<dev>/bcachefs dir when I've not seen any other fs or driver
> > > > > > > do such a thing in the block sysfs dir(s).
> > > > > > > 
> > > > > > > Any thoughts on this from the block subsystem folks? Is it reasonable to
> > > > > > > leave this link around and just fix the removal check, or is another
> > > > > > > behavior preferred? Thanks.
> > > > > 
> > > > > This is the general problem with random cross-subsystem sysfs reference,
> > > > > and why they are best avoided.  The block layer tears down all the sysfs
> > > > > objects at del_gendisk time as no one should start using the sysfs files
> > > > > at that point, but a mounted file system or other opener will of course
> > > > > keep the bdev itself alive.
> > > > > 
> > > > 
> > > > Yeah, makes sense. The fact that the dir goes away despite having the
> > > > bdev open is partly what made this seem a little odd to me.
> > > > 
> > > > > I'm not sure why bcachefs is doing this, but no one really should be
> > > > > using the block layer sysfs structures and pointers except for the block
> > > > > layer itself.  
> > > > > 
> > > > 
> > > > From Kent's comments it sounds like it was just some loose carryover
> > > > from old bcache stuff. I had poked around a bit for anything similar and
> > > > it looked to me that current bcache doesn't do this either, but I could
> > > > have missed something.
> > > > 
> > > > > > so there's an existing bd_holder mechanism that e.g. device mapper uses
> > > > > > for links between block devices. I think the "this block device is going
> > > > > > away" code knows how to clean those up.
> > > > > > 
> > > > > > We're not using that mechanism - and perhaps we should have been, I'd
> > > > > > need a time machine to ask myself why I did it that way 15 years back.
> > > > > 
> > > > > Well, at least Tejun had a very strong opinion that no one should be
> > > > > abusing sysfs symlinks for linking up subsystems at all, see commit
> > > > > 49731baa41df404c2c3f44555869ab387363af43, which is also why this code
> > > > > is marked deprecated and we've not added additional users.
> > > > > 
> > > > 
> > > > Thanks. I'll send a patch to remove this once I'm back from vacation.
> > > 
> > > No, we can't remove it - userspace needs to know this topology. When
> > > we've got one sysfs node with a direct relationship to another sysfs
> > > node, that needs to be reflected in syfs.
> > > 
> > 
> > Do you mean that some related userspace tool relies on this to function,
> > or generally disagreeing with the statement(s) above around links from
> > /sys/block/<dev>/?
> 
> I would have to do some digging to give a definitive answer to that
> question.
> 
> bcache definitely needed those links (IIRC in udev rules?) - bcachefs
> may not, but we haven't even started on integrating bcachefs with all
> the userspace disk management tooling out there.
> 

I don't know bcache that well, but I didn't see anything obviously
putting links in the bdev dir when I last looked. It did look like it
created some links between its own sysfs dirs, but maybe I
misinterpreted the code.

I don't think that any other fs with its own /sys/fs/ presence (i.e.,
XFS, ext4, btrfs) does this sort of thing, so I'm a little skeptical of
the idea that userspace currently needs it (not necessarily that it
couldn't use it for the better in the future).

> And this is stuff that userspace does in general need; if we're getting
> by without it for other filesystems right now it's probably by doing
> something horrid like parsing /proc/mounts; if we could get filesystems
> using the same bd_holder stuff that other block layer stuff uses, that
> would be _amazing_
> 

Ok, but that sounds contradictory to what the block layer folks want.

I dunno, it seems to me that the notions of better general coordination
between /sys/block and /sys/fs and that of bcachefs' current behavior
being wrong are not mutually exclusive things. But I'll just leave it
alone until there's some more clarity here..

Brian


      reply	other threads:[~2024-02-22 13:21 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-01 21:52 kernel oops on bcachefs umount, 6.7 kernel Tony Asleson
2024-02-15 16:55 ` Brian Foster
2024-02-16  0:24   ` Kent Overstreet
2024-02-16  8:00     ` Christoph Hellwig
2024-02-16 12:40       ` Brian Foster
2024-02-16 15:57         ` Kent Overstreet
2024-02-21 12:39           ` Brian Foster
2024-02-22  0:07             ` Kent Overstreet
2024-02-22 13:23               ` Brian Foster [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZddKvWVjyOg1ENz+@bfoster \
    --to=bfoster@redhat.com \
    --cc=djwong@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=tasleson@redhat.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).