From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A00AEC47080 for ; Mon, 31 May 2021 17:10:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 736BB61042 for ; Mon, 31 May 2021 17:10:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233641AbhEaRLi (ORCPT ); Mon, 31 May 2021 13:11:38 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:41921 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234501AbhEaRDV (ORCPT ); Mon, 31 May 2021 13:03:21 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1622480493; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=KgeuUpWscxV0iX92OxtBjDCB+O57pxCNA2/BtpkCAC0=; b=ELDXLd9IBDYQrZ90Wxw3bNUeYEBBDpHUOKJqXxmQVfrbrgF0OzLjeFlxwz8wJH4GRmg5kV ZTMKf7WugvjZ3G9tQIdFT70x1GNMYlRfrWIwyin1FaT4OWVN02eZqbQQiJ6Ue0hTvbHsps +QxH5Nj5R8wG0lA6WXjcHqQmBOc1fcw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-563-b3soyx8DNjKj70Gm-K6TNA-1; Mon, 31 May 2021 13:01:31 -0400 X-MC-Unique: b3soyx8DNjKj70Gm-K6TNA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 15B9F6D241; Mon, 31 May 2021 17:01:30 +0000 (UTC) Received: from max.com (unknown [10.40.192.80]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7F8EA59460; Mon, 31 May 2021 17:01:25 +0000 (UTC) From: Andreas Gruenbacher To: Linus Torvalds Cc: Andreas Gruenbacher , cluster-devel@redhat.com, linux-kernel@vger.kernel.org, Alexander Viro , Jan Kara , Matthew Wilcox Subject: [RFC 0/9] gfs2: handle page faults during read and write Date: Mon, 31 May 2021 19:01:14 +0200 Message-Id: <20210531170123.243771-1-agruenba@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Linus, here's a set of fixes for how gfs2 handles page faults during read and write syscalls. The patch queue is ready for merging except for two open issues. May I ask you to shed some light or ask the right person to help us out? Right now, the filesystem can end up trying to take a lock it's already holding, which causes it to BUG. We can recognize and mostly deal with that simple case, but more complex scenarios exist which involve multiple locks and / or incompatible locking modes. To handle those, we switch to trylocks where necessary. The patches appear to be working as itended, with the following remaining questions: (1) Jan Kara has pointed out [*] that returning VM_FAULT_SIGBUS from the .fault / .page_mkwrite ops will raise SIGBUS, which would be visible to user space. This hasn't been observed in our testing; instead, accesses to the mmapped memory via call chains like: iov_iter_fault_in_readable -> fault_in_pages_readable -> __get_user would simply fail with -EFAULT, as we're expecting. >From looking at do_user_addr_fault and do_sigbus in arch/x86/mm/fault.c, my impression is that VM_FAULT_SIGBUS will not cause SIGBUS to be raised in kernel mode, and that we can rely on the -EFAULT behavior for triggering the retries at the filesystem level. (2) The patch queue introduces the same kind of trylock behavior for both .fault (gfs2_fault) and .page_mkwrite (gfs2_page_mkwrite). I'm not aware of any situation in which we can actually end up in .page_mkwrite during a .read_iter or .write_iter operation, so the trylock behavior in .page_mkwrite might be harmless but unnecessary. Thank you very much, Andreas [*] https://listman.redhat.com/archives/cluster-devel/2021-May/msg00080.html Previous posting of this patch queue: https://listman.redhat.com/archives/cluster-devel/2021-May/msg00073.html New xfstest for mmap + page faults during read / write: https://lore.kernel.org/fstests/20210531152604.240462-1-agruenba@redhat.com/ Andreas Gruenbacher (9): gfs2: Clean up the error handling in gfs2_page_mkwrite gfs2: Add wrapper for iomap_file_buffered_write gfs2: Add gfs2_holder_is_compatible helper gfs2: Fix mmap + page fault deadlocks (part 1) iov_iter: Add iov_iter_fault_in_writeable() gfs2: Add wrappers for accessing journal_info gfs2: Encode glock holding and retry flags in journal_info gfs2: Add LM_FLAG_OUTER glock holder flag gfs2: Fix mmap + page fault deadlocks (part 2) fs/gfs2/aops.c | 6 +- fs/gfs2/bmap.c | 31 ++++---- fs/gfs2/file.c | 175 +++++++++++++++++++++++++++++++++----------- fs/gfs2/glock.c | 12 +++ fs/gfs2/glock.h | 27 ++++++- fs/gfs2/incore.h | 41 +++++++++++ fs/gfs2/inode.c | 2 +- fs/gfs2/log.c | 4 +- fs/gfs2/lops.c | 2 +- fs/gfs2/meta_io.c | 6 +- fs/gfs2/super.c | 2 +- fs/gfs2/trans.c | 16 ++-- include/linux/uio.h | 1 + lib/iov_iter.c | 20 ++++- 14 files changed, 265 insertions(+), 80 deletions(-) -- 2.26.3 From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andreas Gruenbacher Date: Mon, 31 May 2021 19:01:14 +0200 Subject: [Cluster-devel] [RFC 0/9] gfs2: handle page faults during read and write Message-ID: <20210531170123.243771-1-agruenba@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi Linus, here's a set of fixes for how gfs2 handles page faults during read and write syscalls. The patch queue is ready for merging except for two open issues. May I ask you to shed some light or ask the right person to help us out? Right now, the filesystem can end up trying to take a lock it's already holding, which causes it to BUG. We can recognize and mostly deal with that simple case, but more complex scenarios exist which involve multiple locks and / or incompatible locking modes. To handle those, we switch to trylocks where necessary. The patches appear to be working as itended, with the following remaining questions: (1) Jan Kara has pointed out [*] that returning VM_FAULT_SIGBUS from the .fault / .page_mkwrite ops will raise SIGBUS, which would be visible to user space. This hasn't been observed in our testing; instead, accesses to the mmapped memory via call chains like: iov_iter_fault_in_readable -> fault_in_pages_readable -> __get_user would simply fail with -EFAULT, as we're expecting. >From looking at do_user_addr_fault and do_sigbus in arch/x86/mm/fault.c, my impression is that VM_FAULT_SIGBUS will not cause SIGBUS to be raised in kernel mode, and that we can rely on the -EFAULT behavior for triggering the retries at the filesystem level. (2) The patch queue introduces the same kind of trylock behavior for both .fault (gfs2_fault) and .page_mkwrite (gfs2_page_mkwrite). I'm not aware of any situation in which we can actually end up in .page_mkwrite during a .read_iter or .write_iter operation, so the trylock behavior in .page_mkwrite might be harmless but unnecessary. Thank you very much, Andreas [*] https://listman.redhat.com/archives/cluster-devel/2021-May/msg00080.html Previous posting of this patch queue: https://listman.redhat.com/archives/cluster-devel/2021-May/msg00073.html New xfstest for mmap + page faults during read / write: https://lore.kernel.org/fstests/20210531152604.240462-1-agruenba at redhat.com/ Andreas Gruenbacher (9): gfs2: Clean up the error handling in gfs2_page_mkwrite gfs2: Add wrapper for iomap_file_buffered_write gfs2: Add gfs2_holder_is_compatible helper gfs2: Fix mmap + page fault deadlocks (part 1) iov_iter: Add iov_iter_fault_in_writeable() gfs2: Add wrappers for accessing journal_info gfs2: Encode glock holding and retry flags in journal_info gfs2: Add LM_FLAG_OUTER glock holder flag gfs2: Fix mmap + page fault deadlocks (part 2) fs/gfs2/aops.c | 6 +- fs/gfs2/bmap.c | 31 ++++---- fs/gfs2/file.c | 175 +++++++++++++++++++++++++++++++++----------- fs/gfs2/glock.c | 12 +++ fs/gfs2/glock.h | 27 ++++++- fs/gfs2/incore.h | 41 +++++++++++ fs/gfs2/inode.c | 2 +- fs/gfs2/log.c | 4 +- fs/gfs2/lops.c | 2 +- fs/gfs2/meta_io.c | 6 +- fs/gfs2/super.c | 2 +- fs/gfs2/trans.c | 16 ++-- include/linux/uio.h | 1 + lib/iov_iter.c | 20 ++++- 14 files changed, 265 insertions(+), 80 deletions(-) -- 2.26.3