[RFC 4/9] gfs2: Fix mmap + page fault deadlocks (part 1)

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

From: Andreas Gruenbacher <agruenba@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andreas Gruenbacher <agruenba@redhat.com>,
	cluster-devel@redhat.com, linux-kernel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>, Jan Kara <jack@suse.cz>,
	Matthew Wilcox <willy@infradead.org>
Subject: [RFC 4/9] gfs2: Fix mmap + page fault deadlocks (part 1)
Date: Mon, 31 May 2021 19:01:18 +0200	[thread overview]
Message-ID: <20210531170123.243771-5-agruenba@redhat.com> (raw)
In-Reply-To: <20210531170123.243771-1-agruenba@redhat.com>

When the buffer passed to a read or write system call is memory mapped
to the same file, a page fault can occur in filemap_fault.  In that
case, the task will already be holding the inode glock, and trying to
take the same lock again will result in a BUG in add_to_queue().

Fix that by recognizing the self-recursion case.  Either skip the lock
taking (when the glock is held in a compatible way), or fail the
operation.

Likewise, a request to un-share a copy-on-write page can *probably*
happen in similar situations, so treat the locking in gfs2_page_mkwrite
in the same way.

A future patch will handle these case more gracefully by retrying
operations instead of failing them, along with addressing more complex
deadlock scenarios.

Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/file.c | 40 ++++++++++++++++++++++++++++++----------
 1 file changed, 30 insertions(+), 10 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 6d77743f11a4..7d88abb4629b 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -423,6 +423,7 @@ static vm_fault_t gfs2_page_mkwrite(struct vm_fault *vmf)
 	struct page *page = vmf->page;
 	struct inode *inode = file_inode(vmf->vma->vm_file);
 	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_holder *outer_gh = gfs2_glock_is_locked_by_me(ip->i_gl);
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
 	struct gfs2_alloc_parms ap = { .aflags = 0, };
 	u64 offset = page_offset(page);
@@ -436,10 +437,18 @@ static vm_fault_t gfs2_page_mkwrite(struct vm_fault *vmf)
 	sb_start_pagefault(inode->i_sb);
 
 	gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh);
-	err = gfs2_glock_nq(&gh);
-	if (err) {
-		ret = block_page_mkwrite_return(err);
-		goto out_uninit;
+	if (likely(!outer_gh)) {
+		err = gfs2_glock_nq(&gh);
+		if (err) {
+			ret = block_page_mkwrite_return(err);
+			goto out_uninit;
+		}
+	} else {
+		if (!gfs2_holder_is_compatible(outer_gh, LM_ST_EXCLUSIVE)) {
+			/* We could try to upgrade outer_gh here. */
+			ret = VM_FAULT_SIGBUS;
+			goto out_uninit;
+		}
 	}
 
 	/* Check page index against inode size */
@@ -540,7 +549,8 @@ static vm_fault_t gfs2_page_mkwrite(struct vm_fault *vmf)
 out_quota_unlock:
 	gfs2_quota_unlock(ip);
 out_unlock:
-	gfs2_glock_dq(&gh);
+	if (likely(!outer_gh))
+		gfs2_glock_dq(&gh);
 out_uninit:
 	gfs2_holder_uninit(&gh);
 	if (ret == VM_FAULT_LOCKED) {
@@ -555,6 +565,7 @@ static vm_fault_t gfs2_fault(struct vm_fault *vmf)
 {
 	struct inode *inode = file_inode(vmf->vma->vm_file);
 	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_holder *outer_gh = gfs2_glock_is_locked_by_me(ip->i_gl);
 	struct gfs2_holder gh;
 	vm_fault_t ret;
 	u16 state;
@@ -562,13 +573,22 @@ static vm_fault_t gfs2_fault(struct vm_fault *vmf)
 
 	state = (vmf->flags & FAULT_FLAG_WRITE) ? LM_ST_EXCLUSIVE : LM_ST_SHARED;
 	gfs2_holder_init(ip->i_gl, state, 0, &gh);
-	err = gfs2_glock_nq(&gh);
-	if (err) {
-		ret = block_page_mkwrite_return(err);
-		goto out_uninit;
+	if (likely(!outer_gh)) {
+		err = gfs2_glock_nq(&gh);
+		if (err) {
+			ret = block_page_mkwrite_return(err);
+			goto out_uninit;
+		}
+	} else {
+		if (!gfs2_holder_is_compatible(outer_gh, state)) {
+			/* We could try to upgrade outer_gh here. */
+			ret = VM_FAULT_SIGBUS;
+			goto out_uninit;
+		}
 	}
 	ret = filemap_fault(vmf);
-	gfs2_glock_dq(&gh);
+	if (likely(!outer_gh))
+		gfs2_glock_dq(&gh);
 out_uninit:
 	gfs2_holder_uninit(&gh);
 	return ret;
-- 
2.26.3

WARNING: multiple messages have this Message-ID (diff)

From: Andreas Gruenbacher <agruenba@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [RFC 4/9] gfs2: Fix mmap + page fault deadlocks (part 1)
Date: Mon, 31 May 2021 19:01:18 +0200	[thread overview]
Message-ID: <20210531170123.243771-5-agruenba@redhat.com> (raw)
In-Reply-To: <20210531170123.243771-1-agruenba@redhat.com>

When the buffer passed to a read or write system call is memory mapped
to the same file, a page fault can occur in filemap_fault.  In that
case, the task will already be holding the inode glock, and trying to
take the same lock again will result in a BUG in add_to_queue().

Fix that by recognizing the self-recursion case.  Either skip the lock
taking (when the glock is held in a compatible way), or fail the
operation.

Likewise, a request to un-share a copy-on-write page can *probably*
happen in similar situations, so treat the locking in gfs2_page_mkwrite
in the same way.

A future patch will handle these case more gracefully by retrying
operations instead of failing them, along with addressing more complex
deadlock scenarios.

Reported-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
---
 fs/gfs2/file.c | 40 ++++++++++++++++++++++++++++++----------
 1 file changed, 30 insertions(+), 10 deletions(-)

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 6d77743f11a4..7d88abb4629b 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -423,6 +423,7 @@ static vm_fault_t gfs2_page_mkwrite(struct vm_fault *vmf)
 	struct page *page = vmf->page;
 	struct inode *inode = file_inode(vmf->vma->vm_file);
 	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_holder *outer_gh = gfs2_glock_is_locked_by_me(ip->i_gl);
 	struct gfs2_sbd *sdp = GFS2_SB(inode);
 	struct gfs2_alloc_parms ap = { .aflags = 0, };
 	u64 offset = page_offset(page);
@@ -436,10 +437,18 @@ static vm_fault_t gfs2_page_mkwrite(struct vm_fault *vmf)
 	sb_start_pagefault(inode->i_sb);
 
 	gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh);
-	err = gfs2_glock_nq(&gh);
-	if (err) {
-		ret = block_page_mkwrite_return(err);
-		goto out_uninit;
+	if (likely(!outer_gh)) {
+		err = gfs2_glock_nq(&gh);
+		if (err) {
+			ret = block_page_mkwrite_return(err);
+			goto out_uninit;
+		}
+	} else {
+		if (!gfs2_holder_is_compatible(outer_gh, LM_ST_EXCLUSIVE)) {
+			/* We could try to upgrade outer_gh here. */
+			ret = VM_FAULT_SIGBUS;
+			goto out_uninit;
+		}
 	}
 
 	/* Check page index against inode size */
@@ -540,7 +549,8 @@ static vm_fault_t gfs2_page_mkwrite(struct vm_fault *vmf)
 out_quota_unlock:
 	gfs2_quota_unlock(ip);
 out_unlock:
-	gfs2_glock_dq(&gh);
+	if (likely(!outer_gh))
+		gfs2_glock_dq(&gh);
 out_uninit:
 	gfs2_holder_uninit(&gh);
 	if (ret == VM_FAULT_LOCKED) {
@@ -555,6 +565,7 @@ static vm_fault_t gfs2_fault(struct vm_fault *vmf)
 {
 	struct inode *inode = file_inode(vmf->vma->vm_file);
 	struct gfs2_inode *ip = GFS2_I(inode);
+	struct gfs2_holder *outer_gh = gfs2_glock_is_locked_by_me(ip->i_gl);
 	struct gfs2_holder gh;
 	vm_fault_t ret;
 	u16 state;
@@ -562,13 +573,22 @@ static vm_fault_t gfs2_fault(struct vm_fault *vmf)
 
 	state = (vmf->flags & FAULT_FLAG_WRITE) ? LM_ST_EXCLUSIVE : LM_ST_SHARED;
 	gfs2_holder_init(ip->i_gl, state, 0, &gh);
-	err = gfs2_glock_nq(&gh);
-	if (err) {
-		ret = block_page_mkwrite_return(err);
-		goto out_uninit;
+	if (likely(!outer_gh)) {
+		err = gfs2_glock_nq(&gh);
+		if (err) {
+			ret = block_page_mkwrite_return(err);
+			goto out_uninit;
+		}
+	} else {
+		if (!gfs2_holder_is_compatible(outer_gh, state)) {
+			/* We could try to upgrade outer_gh here. */
+			ret = VM_FAULT_SIGBUS;
+			goto out_uninit;
+		}
 	}
 	ret = filemap_fault(vmf);
-	gfs2_glock_dq(&gh);
+	if (likely(!outer_gh))
+		gfs2_glock_dq(&gh);
 out_uninit:
 	gfs2_holder_uninit(&gh);
 	return ret;
-- 
2.26.3

next prev parent reply	other threads:[~2021-05-31 17:09 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-31 17:01 [RFC 0/9] gfs2: handle page faults during read and write Andreas Gruenbacher
2021-05-31 17:01 ` [Cluster-devel] " Andreas Gruenbacher
2021-05-31 17:01 ` [RFC 1/9] gfs2: Clean up the error handling in gfs2_page_mkwrite Andreas Gruenbacher
2021-05-31 17:01   ` [Cluster-devel] " Andreas Gruenbacher
2021-05-31 17:01 ` [RFC 2/9] gfs2: Add wrapper for iomap_file_buffered_write Andreas Gruenbacher
2021-05-31 17:01   ` [Cluster-devel] " Andreas Gruenbacher
2021-05-31 17:01 ` [RFC 3/9] gfs2: Add gfs2_holder_is_compatible helper Andreas Gruenbacher
2021-05-31 17:01   ` [Cluster-devel] " Andreas Gruenbacher
2021-05-31 17:01 ` Andreas Gruenbacher [this message]
2021-05-31 17:01   ` [Cluster-devel] [RFC 4/9] gfs2: Fix mmap + page fault deadlocks (part 1) Andreas Gruenbacher
2021-06-01  6:00   ` Linus Torvalds
2021-06-01  6:00     ` [Cluster-devel] " Linus Torvalds
2021-06-02 11:16     ` Andreas Gruenbacher
2021-06-02 11:16       ` [Cluster-devel] " Andreas Gruenbacher
2021-06-11 16:25       ` Al Viro
2021-06-11 16:25         ` [Cluster-devel] " Al Viro
2021-06-12 21:05         ` Al Viro
2021-06-12 21:05           ` [Cluster-devel] " Al Viro
2021-06-12 21:35           ` Al Viro
2021-06-12 21:35             ` [Cluster-devel] " Al Viro
2021-06-13  8:44             ` Steven Whitehouse
2021-06-13  8:44               ` Steven Whitehouse
2021-05-31 17:01 ` [RFC 5/9] iov_iter: Add iov_iter_fault_in_writeable() Andreas Gruenbacher
2021-05-31 17:01   ` [Cluster-devel] " Andreas Gruenbacher
2021-05-31 17:12   ` Al Viro
2021-05-31 17:12     ` [Cluster-devel] " Al Viro
2021-06-12 21:12     ` Al Viro
2021-06-12 21:12       ` [Cluster-devel] " Al Viro
2021-06-12 21:33       ` Linus Torvalds
2021-06-12 21:33         ` [Cluster-devel] " Linus Torvalds
2021-06-12 21:47         ` Al Viro
2021-06-12 21:47           ` [Cluster-devel] " Al Viro
2021-06-12 23:17           ` Linus Torvalds
2021-06-12 23:17             ` [Cluster-devel] " Linus Torvalds
2021-06-12 23:38             ` Al Viro
2021-06-12 23:38               ` [Cluster-devel] " Al Viro
2021-05-31 17:01 ` [RFC 6/9] gfs2: Add wrappers for accessing journal_info Andreas Gruenbacher
2021-05-31 17:01   ` [Cluster-devel] " Andreas Gruenbacher
2021-05-31 17:01 ` [RFC 7/9] gfs2: Encode glock holding and retry flags in journal_info Andreas Gruenbacher
2021-05-31 17:01   ` [Cluster-devel] " Andreas Gruenbacher
2021-05-31 17:01 ` [RFC 8/9] gfs2: Add LM_FLAG_OUTER glock holder flag Andreas Gruenbacher
2021-05-31 17:01   ` [Cluster-devel] " Andreas Gruenbacher
2021-05-31 17:01 ` [RFC 9/9] gfs2: Fix mmap + page fault deadlocks (part 2) Andreas Gruenbacher
2021-05-31 17:01   ` [Cluster-devel] " Andreas Gruenbacher
2021-06-01  5:47   ` Linus Torvalds
2021-06-01  5:47     ` [Cluster-devel] " Linus Torvalds
2021-05-31 17:57 ` [Cluster-devel] [RFC 0/9] gfs2: handle page faults during read and write Linus Torvalds
2021-05-31 20:35   ` Andreas Gruenbacher
2021-05-31 20:35     ` [Cluster-devel] " Andreas Gruenbacher

find likely ancestor, descendant, or conflicting patches for this message:
dfblob:6d77743f11a dfblob:6d77743f11a dfblob:7d88abb4629
dfblob:7d88abb4629
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210531170123.243771-5-agruenba@redhat.com \
    --to=agruenba@redhat.com \
    --cc=cluster-devel@redhat.com \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.