cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH 06/32] sched: Add task_struct->faults_disabled_mapping
Date: Tue, 23 May 2023 15:34:31 +0200	[thread overview]
Message-ID: <20230523133431.wwrkjtptu6vqqh5e@quack3> (raw)
In-Reply-To: <ZFs3RYgdCeKjxYCw@moria.home.lan>

On Wed 10-05-23 02:18:45, Kent Overstreet wrote:
> On Wed, May 10, 2023 at 03:07:37AM +0200, Jan Kara wrote:
> > On Tue 09-05-23 12:56:31, Kent Overstreet wrote:
> > > From: Kent Overstreet <kent.overstreet@gmail.com>
> > > 
> > > This is used by bcachefs to fix a page cache coherency issue with
> > > O_DIRECT writes.
> > > 
> > > Also relevant: mapping->invalidate_lock, see below.
> > > 
> > > O_DIRECT writes (and other filesystem operations that modify file data
> > > while bypassing the page cache) need to shoot down ranges of the page
> > > cache - and additionally, need locking to prevent those pages from
> > > pulled back in.
> > > 
> > > But O_DIRECT writes invoke the page fault handler (via get_user_pages),
> > > and the page fault handler will need to take that same lock - this is a
> > > classic recursive deadlock if userspace has mmaped the file they're DIO
> > > writing to and uses those pages for the buffer to write from, and it's a
> > > lock ordering deadlock in general.
> > > 
> > > Thus we need a way to signal from the dio code to the page fault handler
> > > when we already are holding the pagecache add lock on an address space -
> > > this patch just adds a member to task_struct for this purpose. For now
> > > only bcachefs is implementing this locking, though it may be moved out
> > > of bcachefs and made available to other filesystems in the future.
> > 
> > It would be nice to have at least a link to the code that's actually using
> > the field you are adding.
> 
> Bit of a trick to link to a _later_ patch in the series from a commit
> message, but...
> 
> https://evilpiepirate.org/git/bcachefs.git/tree/fs/bcachefs/fs-io.c#n975
> https://evilpiepirate.org/git/bcachefs.git/tree/fs/bcachefs/fs-io.c#n2454

Thanks and I'm sorry for the delay.

> > Also I think we were already through this discussion [1] and we ended up
> > agreeing that your scheme actually solves only the AA deadlock but a
> > malicious userspace can easily create AB BA deadlock by running direct IO
> > to file A using mapped file B as a buffer *and* direct IO to file B using
> > mapped file A as a buffer.
> 
> No, that's definitely handled (and you can see it in the code I linked),
> and I wrote a torture test for fstests as well.

I've checked the code and AFAICT it is all indeed handled. BTW, I've now
remembered that GFS2 has dealt with the same deadlocks - b01b2d72da25
("gfs2: Fix mmap + page fault deadlocks for direct I/O") - in a different
way (by prefaulting pages from the iter before grabbing the problematic
lock and then disabling page faults for the iomap_dio_rw() call). I guess
we should somehow unify these schemes so that we don't have two mechanisms
for avoiding exactly the same deadlock. Adding GFS2 guys to CC.

Also good that you've written a fstest for this, that is definitely a useful
addition, although I suspect GFS2 guys added a test for this not so long
ago when testing their stuff. Maybe they have a pointer handy?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


       reply	other threads:[~2023-05-23 13:34 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20230509165657.1735798-1-kent.overstreet@linux.dev>
     [not found] ` <20230509165657.1735798-7-kent.overstreet@linux.dev>
     [not found]   ` <20230510010737.heniyuxazlprrbd6@quack3>
     [not found]     ` <ZFs3RYgdCeKjxYCw@moria.home.lan>
2023-05-23 13:34       ` Jan Kara [this message]
2023-05-23 16:21         ` [Cluster-devel] [PATCH 06/32] sched: Add task_struct->faults_disabled_mapping Christoph Hellwig
2023-05-23 16:35           ` Kent Overstreet
2023-05-24  6:43             ` Christoph Hellwig
2023-05-24  8:09               ` Kent Overstreet
2023-05-25  8:58                 ` Christoph Hellwig
2023-05-25 20:50                   ` Kent Overstreet
2023-05-26  8:06                     ` Christoph Hellwig
2023-05-26  8:34                       ` Kent Overstreet
2023-05-25 21:40                   ` Kent Overstreet
2023-05-25 22:25           ` Andreas Grünbacher
2023-05-25 23:20             ` Kent Overstreet
2023-05-26  0:05               ` Andreas Grünbacher
2023-05-26  0:39                 ` Kent Overstreet
2023-05-26  8:10               ` Christoph Hellwig
2023-05-26  8:38                 ` Kent Overstreet
2023-05-23 16:49         ` Kent Overstreet
2023-05-25  8:47           ` Jan Kara
2023-05-25 21:36             ` Kent Overstreet
2023-05-25 22:45             ` Andreas Grünbacher
2023-05-25 22:04         ` Andreas Grünbacher

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230523133431.wwrkjtptu6vqqh5e@quack3 \
    --to=jack@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).