All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] NFS/RDMA bug fixes
@ 2014-01-17 19:37 Chuck Lever
  2014-01-17 19:38 ` [PATCH 1/3] NFS: Fix READDIR oops with NFSv4 on RDMA Chuck Lever
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Chuck Lever @ 2014-01-17 19:37 UTC (permalink / raw
  To: linux-nfs

This brief series fixes some long-standing regressions in our
NFS/RDMA client implementation.  Comments / review?

---

Chuck Lever (3):
      SUNRPC: remove KERN_INFO from dprintk() call sites
      SUNRPC: Fix large reads on NFS/RDMA
      NFS: Fix READDIR oops with NFSv4 on RDMA


 fs/nfs/nfs4xdr.c                |    3 +--
 net/sunrpc/xprtrdma/rpc_rdma.c  |    4 +---
 net/sunrpc/xprtrdma/transport.c |   10 +++++-----
 3 files changed, 7 insertions(+), 10 deletions(-)

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] NFS: Fix READDIR oops with NFSv4 on RDMA
  2014-01-17 19:37 [PATCH 0/3] NFS/RDMA bug fixes Chuck Lever
@ 2014-01-17 19:38 ` Chuck Lever
  2014-01-17 19:38 ` [PATCH 2/3] SUNRPC: Fix large reads on NFS/RDMA Chuck Lever
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Chuck Lever @ 2014-01-17 19:38 UTC (permalink / raw
  To: linux-nfs

When starting the Connectathon basic tests on an NFSv4 RDMA
mount, I encountered this oops:

  BUG: unable to handle kernel NULL pointer dereference at (null)
  IP: [<ffffffff8129cc56>] memcpy+0x6/0x110
  PGD 2106cd067 PUD 20fef9067 PMD 0
  Oops: 0000 [#1] SMP

 ...

  [<ffffffffa05dc1b1>] ? xdr_inline_decode+0xb1/0x120 [sunrpc]
  [<ffffffffa071f19c>] nfs4_decode_dirent+0x4c/0x250 [nfsv4]
  [<ffffffff81178a02>] ? alloc_pages_current+0xb2/0x170
  [<ffffffffa06a1225>] nfs_readdir_page_filler+0xe5/0x2c0 [nfs]
  [<ffffffffa06a1622>] nfs_readdir_xdr_to_array+0x222/0x2e0 [nfs]
  [<ffffffffa06a1702>] nfs_readdir_filler+0x22/0x90 [nfs]
  [<ffffffff8112f975>] ? add_to_page_cache_lru+0x35/0x50
  [<ffffffff8112faee>] __read_cache_page+0x7e/0xe0
  [<ffffffffa06a16e0>] ? nfs_readdir_xdr_to_array+0x2e0/0x2e0 [nfs]
  [<ffffffffa06a16e0>] ? nfs_readdir_xdr_to_array+0x2e0/0x2e0 [nfs]
  [<ffffffff8113079c>] do_read_cache_page+0x3c/0x110
  [<ffffffff811308b9>] read_cache_page_async+0x19/0x20
  [<ffffffff811308ce>] read_cache_page+0xe/0x20
  [<ffffffffa06a1c1e>] nfs_readdir+0x14e/0x3d0 [nfs]
  [<ffffffffa071f150>] ? decode_pathconf+0x1c0/0x1c0 [nfsv4]
  [<ffffffff811a811d>] iterate_dir+0xad/0xd0
  [<ffffffff811a71ca>] ? do_fcntl+0x28a/0x370
  [<ffffffff811a82d5>] SyS_getdents+0x95/0x100
  [<ffffffff811a83e0>] ? SyS_old_readdir+0xa0/0xa0
  [<ffffffff815a7752>] system_call_fastpath+0x16/0x1b

The problem does not occur with NFSv3 over RDMA.

nfs4_decode_dirent() is confused because the xdr_buf's page
vector starts long after the first directory entry in the
server's reply.

Commit aa9c2669, "NFS: Client implementation of Labeled-NFS,"
is reported by git bisect as the first bad commit.

This commit changes the decode_readdir_maxsz macro.  This macro
controls where the generic XDR routines split incoming readdir
reply data between the head[0] buffer and the page cache.

Security labels go with each directory entry, thus they are
always stored in the page cache, not in the head buffer.  The
length of the reply that goes in head[0] should not change.

I've reverted the change to decode_readdir_maxsz.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=68371
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org> # 3.11+
---

 fs/nfs/nfs4xdr.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 5be2868..79e1d02 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -203,8 +203,7 @@ static int nfs4_stat_to_errno(int);
 				 2 + encode_verifier_maxsz + 5 + \
 				nfs4_label_maxsz)
 #define decode_readdir_maxsz	(op_decode_hdr_maxsz + \
-				 decode_verifier_maxsz + \
-				nfs4_label_maxsz + nfs4_fattr_maxsz)
+				 decode_verifier_maxsz)
 #define encode_readlink_maxsz	(op_encode_hdr_maxsz)
 #define decode_readlink_maxsz	(op_decode_hdr_maxsz + 1)
 #define encode_write_maxsz	(op_encode_hdr_maxsz + \


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] SUNRPC: Fix large reads on NFS/RDMA
  2014-01-17 19:37 [PATCH 0/3] NFS/RDMA bug fixes Chuck Lever
  2014-01-17 19:38 ` [PATCH 1/3] NFS: Fix READDIR oops with NFSv4 on RDMA Chuck Lever
@ 2014-01-17 19:38 ` Chuck Lever
  2014-01-21 20:17   ` Jeff Layton
  2014-01-17 19:38 ` [PATCH 3/3] SUNRPC: remove KERN_INFO from dprintk() call sites Chuck Lever
       [not found] ` <CABgxfbFbLKV98GavS0x_X_ZXw0hZXPEQCpfoJJi+uNp133qt2w@mail.gmail.com>
  3 siblings, 1 reply; 8+ messages in thread
From: Chuck Lever @ 2014-01-17 19:38 UTC (permalink / raw
  To: linux-nfs

After commit a11a2bf4, "SUNRPC: Optimise away unnecessary data
moves in xdr_align_pages", Thu Aug 2 13:21:43 2012, READs larger
than a few hundred bytes via NFS/RDMA stopped working.  This
commit exposed a long-standing bug in rpcrdma_inline_fixup().

I reproduce this with an rsize=4096 mount using the cthon04
basic tests.  Test 5 fails with an EIO error.

For my reproducer, kernel log shows:

  NFS: server cheating in read reply: count 4096 > recvd 0

rpcrdma_inline_fixup() is zeroing the xdr_stream::page_len
field, and xdr_align_pages() is now returning that value to
the READ XDR decoder function.

That field is set up by xdr_inline_pages() by the READ XDR
encoder function.  As far as I can tell, it is supposed to
be left alone after that, as it describes the dimensions of
the reply xdr_stream, not the contents of that stream.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=68391
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org> # 3.7+
---

 net/sunrpc/xprtrdma/rpc_rdma.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
index e03725b..96ead52 100644
--- a/net/sunrpc/xprtrdma/rpc_rdma.c
+++ b/net/sunrpc/xprtrdma/rpc_rdma.c
@@ -649,9 +649,7 @@ rpcrdma_inline_fixup(struct rpc_rqst *rqst, char *srcp, int copy_len, int pad)
 				break;
 			page_base = 0;
 		}
-		rqst->rq_rcv_buf.page_len = olen - copy_len;
-	} else
-		rqst->rq_rcv_buf.page_len = 0;
+	}
 
 	if (copy_len && rqst->rq_rcv_buf.tail[0].iov_len) {
 		curlen = copy_len;


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] SUNRPC: remove KERN_INFO from dprintk() call sites
  2014-01-17 19:37 [PATCH 0/3] NFS/RDMA bug fixes Chuck Lever
  2014-01-17 19:38 ` [PATCH 1/3] NFS: Fix READDIR oops with NFSv4 on RDMA Chuck Lever
  2014-01-17 19:38 ` [PATCH 2/3] SUNRPC: Fix large reads on NFS/RDMA Chuck Lever
@ 2014-01-17 19:38 ` Chuck Lever
  2014-01-21 20:18   ` Jeff Layton
       [not found] ` <CABgxfbFbLKV98GavS0x_X_ZXw0hZXPEQCpfoJJi+uNp133qt2w@mail.gmail.com>
  3 siblings, 1 reply; 8+ messages in thread
From: Chuck Lever @ 2014-01-17 19:38 UTC (permalink / raw
  To: linux-nfs

The use of KERN_INFO causes garbage characters to appear
when debugging is enabled.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 net/sunrpc/xprtrdma/transport.c |   10 +++++-----
 1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 285dc08..1eb9c46 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -733,7 +733,7 @@ static void __exit xprt_rdma_cleanup(void)
 {
 	int rc;
 
-	dprintk(KERN_INFO "RPCRDMA Module Removed, deregister RPC RDMA transport\n");
+	dprintk("RPCRDMA Module Removed, deregister RPC RDMA transport\n");
 #ifdef RPC_DEBUG
 	if (sunrpc_table_header) {
 		unregister_sysctl_table(sunrpc_table_header);
@@ -755,14 +755,14 @@ static int __init xprt_rdma_init(void)
 	if (rc)
 		return rc;
 
-	dprintk(KERN_INFO "RPCRDMA Module Init, register RPC RDMA transport\n");
+	dprintk("RPCRDMA Module Init, register RPC RDMA transport\n");
 
-	dprintk(KERN_INFO "Defaults:\n");
-	dprintk(KERN_INFO "\tSlots %d\n"
+	dprintk("Defaults:\n");
+	dprintk("\tSlots %d\n"
 		"\tMaxInlineRead %d\n\tMaxInlineWrite %d\n",
 		xprt_rdma_slot_table_entries,
 		xprt_rdma_max_inline_read, xprt_rdma_max_inline_write);
-	dprintk(KERN_INFO "\tPadding %d\n\tMemreg %d\n",
+	dprintk("\tPadding %d\n\tMemreg %d\n",
 		xprt_rdma_inline_write_padding, xprt_rdma_memreg_strategy);
 
 #ifdef RPC_DEBUG


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] SUNRPC: Fix large reads on NFS/RDMA
  2014-01-17 19:38 ` [PATCH 2/3] SUNRPC: Fix large reads on NFS/RDMA Chuck Lever
@ 2014-01-21 20:17   ` Jeff Layton
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2014-01-21 20:17 UTC (permalink / raw
  To: Chuck Lever; +Cc: linux-nfs

On Fri, 17 Jan 2014 14:38:16 -0500
Chuck Lever <chuck.lever@oracle.com> wrote:

> After commit a11a2bf4, "SUNRPC: Optimise away unnecessary data
> moves in xdr_align_pages", Thu Aug 2 13:21:43 2012, READs larger
> than a few hundred bytes via NFS/RDMA stopped working.  This
> commit exposed a long-standing bug in rpcrdma_inline_fixup().
> 
> I reproduce this with an rsize=4096 mount using the cthon04
> basic tests.  Test 5 fails with an EIO error.
> 
> For my reproducer, kernel log shows:
> 
>   NFS: server cheating in read reply: count 4096 > recvd 0
> 
> rpcrdma_inline_fixup() is zeroing the xdr_stream::page_len
> field, and xdr_align_pages() is now returning that value to
> the READ XDR decoder function.
> 
> That field is set up by xdr_inline_pages() by the READ XDR
> encoder function.  As far as I can tell, it is supposed to
> be left alone after that, as it describes the dimensions of
> the reply xdr_stream, not the contents of that stream.
> 
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=68391
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Cc: <stable@vger.kernel.org> # 3.7+
> ---
> 
>  net/sunrpc/xprtrdma/rpc_rdma.c |    4 +---
>  1 files changed, 1 insertions(+), 3 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/rpc_rdma.c b/net/sunrpc/xprtrdma/rpc_rdma.c
> index e03725b..96ead52 100644
> --- a/net/sunrpc/xprtrdma/rpc_rdma.c
> +++ b/net/sunrpc/xprtrdma/rpc_rdma.c
> @@ -649,9 +649,7 @@ rpcrdma_inline_fixup(struct rpc_rqst *rqst, char *srcp, int copy_len, int pad)
>  				break;
>  			page_base = 0;
>  		}
> -		rqst->rq_rcv_buf.page_len = olen - copy_len;
> -	} else
> -		rqst->rq_rcv_buf.page_len = 0;
> +	}
>  
>  	if (copy_len && rqst->rq_rcv_buf.tail[0].iov_len) {
>  		curlen = copy_len;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Seems to fix the bug for me too...

Tested-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] SUNRPC: remove KERN_INFO from dprintk() call sites
  2014-01-17 19:38 ` [PATCH 3/3] SUNRPC: remove KERN_INFO from dprintk() call sites Chuck Lever
@ 2014-01-21 20:18   ` Jeff Layton
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff Layton @ 2014-01-21 20:18 UTC (permalink / raw
  To: Chuck Lever; +Cc: linux-nfs

On Fri, 17 Jan 2014 14:38:25 -0500
Chuck Lever <chuck.lever@oracle.com> wrote:

> The use of KERN_INFO causes garbage characters to appear
> when debugging is enabled.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> 
>  net/sunrpc/xprtrdma/transport.c |   10 +++++-----
>  1 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
> index 285dc08..1eb9c46 100644
> --- a/net/sunrpc/xprtrdma/transport.c
> +++ b/net/sunrpc/xprtrdma/transport.c
> @@ -733,7 +733,7 @@ static void __exit xprt_rdma_cleanup(void)
>  {
>  	int rc;
>  
> -	dprintk(KERN_INFO "RPCRDMA Module Removed, deregister RPC RDMA transport\n");
> +	dprintk("RPCRDMA Module Removed, deregister RPC RDMA transport\n");
>  #ifdef RPC_DEBUG
>  	if (sunrpc_table_header) {
>  		unregister_sysctl_table(sunrpc_table_header);
> @@ -755,14 +755,14 @@ static int __init xprt_rdma_init(void)
>  	if (rc)
>  		return rc;
>  
> -	dprintk(KERN_INFO "RPCRDMA Module Init, register RPC RDMA transport\n");
> +	dprintk("RPCRDMA Module Init, register RPC RDMA transport\n");
>  
> -	dprintk(KERN_INFO "Defaults:\n");
> -	dprintk(KERN_INFO "\tSlots %d\n"
> +	dprintk("Defaults:\n");
> +	dprintk("\tSlots %d\n"
>  		"\tMaxInlineRead %d\n\tMaxInlineWrite %d\n",
>  		xprt_rdma_slot_table_entries,
>  		xprt_rdma_max_inline_read, xprt_rdma_max_inline_write);
> -	dprintk(KERN_INFO "\tPadding %d\n\tMemreg %d\n",
> +	dprintk("\tPadding %d\n\tMemreg %d\n",
>  		xprt_rdma_inline_write_padding, xprt_rdma_memreg_strategy);
>  
>  #ifdef RPC_DEBUG
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nice cleanup

Reviewed-by: Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] NFS/RDMA bug fixes
       [not found] ` <CABgxfbFbLKV98GavS0x_X_ZXw0hZXPEQCpfoJJi+uNp133qt2w@mail.gmail.com>
@ 2014-01-24 18:34   ` Chuck Lever
       [not found]     ` <CABgxfbEcKbeFMS=yobqp9TeHn4aa-Kjvi_tfbLK2xRbmFtmk9A@mail.gmail.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Chuck Lever @ 2014-01-24 18:34 UTC (permalink / raw
  To: Wendy Cheng; +Cc: Linux NFS Mailing List


On Jan 24, 2014, at 1:26 PM, Wendy Cheng <s.wendy.cheng@gmail.com> wrote:

> On Fri, Jan 17, 2014 at 11:37 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> This brief series fixes some long-standing regressions in our
> NFS/RDMA client implementation.  Comments / review?
> 
> ---
> 
> Chuck Lever (3):
>       SUNRPC: remove KERN_INFO from dprintk() call sites
>       SUNRPC: Fix large reads on NFS/RDMA
>       NFS: Fix READDIR oops with NFSv4 on RDMA
> 
> 
>  fs/nfs/nfs4xdr.c                |    3 +--
>  net/sunrpc/xprtrdma/rpc_rdma.c  |    4 +---
>  net/sunrpc/xprtrdma/transport.c |   10 +++++-----
>  3 files changed, 7 insertions(+), 10 deletions(-)
> 
> 
> We have an embedded system that is currently set at NFS V3 with kernel 2.6.38 (as NFS client). Could folks suggest a good kernel version (close to 2.6.38 or RHEL based) to pick up NFS v4 (in order to pick up NFS over RDMA) ? 

If NFS/RDMA is working in your kernel, I believe NFSv3 and NFSv4 should both work over RDMA.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 0/3] NFS/RDMA bug fixes
       [not found]     ` <CABgxfbEcKbeFMS=yobqp9TeHn4aa-Kjvi_tfbLK2xRbmFtmk9A@mail.gmail.com>
@ 2014-01-24 21:50       ` Chuck Lever
  0 siblings, 0 replies; 8+ messages in thread
From: Chuck Lever @ 2014-01-24 21:50 UTC (permalink / raw
  To: Wendy Cheng; +Cc: Linux NFS Mailing List


On Jan 24, 2014, at 4:00 PM, Wendy Cheng <s.wendy.cheng@gmail.com> wrote:

> On Fri, Jan 24, 2014 at 10:34 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
> On Jan 24, 2014, at 1:26 PM, Wendy Cheng <s.wendy.cheng@gmail.com> wrote:
> 
> > On Fri, Jan 17, 2014 at 11:37 AM, Chuck Lever <chuck.lever@oracle.com> wrote:
> > This brief series fixes some long-standing regressions in our
> > NFS/RDMA client implementation.  Comments / review?
> >
> > ---
> >
> > Chuck Lever (3):
> >       SUNRPC: remove KERN_INFO from dprintk() call sites
> >       SUNRPC: Fix large reads on NFS/RDMA
> >       NFS: Fix READDIR oops with NFSv4 on RDMA
> >
> >
> >  fs/nfs/nfs4xdr.c                |    3 +--
> >  net/sunrpc/xprtrdma/rpc_rdma.c  |    4 +---
> >  net/sunrpc/xprtrdma/transport.c |   10 +++++-----
> >  3 files changed, 7 insertions(+), 10 deletions(-)
> >
> >
> > We have an embedded system that is currently set at NFS V3 with kernel 2.6.38 (as NFS client). Could folks suggest a good kernel version (close to 2.6.38 or RHEL based) to pick up NFS v4 (in order to pick up NFS over RDMA) ?
> 
> If NFS/RDMA is working in your kernel, I believe NFSv3 and NFSv4 should both work over RDMA.
> 
> 
> I did have V3 running (with NFS over RDMA). The issue here is how to move it from an experimental project into a release candidate. I think NFS V4 would be a better platform to start with (?) ... So let me re-phrase the question .. 
> 
> Regardless RDMA support, will (client side) NFS V4 work well with 2.6.38 kernel? If not, what would be a good version (close to 2.6.38) to go to for a stable NFS V4 client ?

Client support for NFSv4 in recent updates of RHEL6 is stable.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-01-24 21:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-17 19:37 [PATCH 0/3] NFS/RDMA bug fixes Chuck Lever
2014-01-17 19:38 ` [PATCH 1/3] NFS: Fix READDIR oops with NFSv4 on RDMA Chuck Lever
2014-01-17 19:38 ` [PATCH 2/3] SUNRPC: Fix large reads on NFS/RDMA Chuck Lever
2014-01-21 20:17   ` Jeff Layton
2014-01-17 19:38 ` [PATCH 3/3] SUNRPC: remove KERN_INFO from dprintk() call sites Chuck Lever
2014-01-21 20:18   ` Jeff Layton
     [not found] ` <CABgxfbFbLKV98GavS0x_X_ZXw0hZXPEQCpfoJJi+uNp133qt2w@mail.gmail.com>
2014-01-24 18:34   ` [PATCH 0/3] NFS/RDMA bug fixes Chuck Lever
     [not found]     ` <CABgxfbEcKbeFMS=yobqp9TeHn4aa-Kjvi_tfbLK2xRbmFtmk9A@mail.gmail.com>
2014-01-24 21:50       ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.