All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/24] LAYOUTGET invocation (rebased)
@ 2010-06-08  4:18 Fred Isaman
  2010-06-08  4:18 ` [PATCH 01/24] Revert "pnfs-nonfilelayout: Prelim support for non-file layout O_DIRECT" Fred Isaman
  0 siblings, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:18 UTC (permalink / raw
  To: linux-nfs

This patch series limits LAYOUTGET invocation to the beginning of the
IO paths.  It applies atop Alexandros's recent patchset submission,
but is otherwise basically the same as the previous submission.  It is
intended for the pnfs_submit branch, without reversion in a
post_submit branch.

Patches 1-4 revert direct IO.  Commit is already broken, and this
series breaks them further.  The problem is that the direct IO
redefines data->wb_req and data->pages, so that it can only work with
the pnfs code if we don't look at those fields. The reverted code
should be saved somewhere.  I tend to agree with Boaz that keeping it
in git is preferable, but I can supply a patch which returns the code
ifdef'ed out if tht is preferred.

Patches 5-9 do some code cleanup in preperation for the real work.

Patches 10-21 implement the change.  NOTE that patch 20 changes the
calling convention of the layout drivers commit calls.  There is no
longer a universal lseg for the commit, instead each nfs_page has an
lseg attached, with NULL meaning to go through the MDS.

Patches 21-24 rework the filelayout commit function, and then do some
code cleanup this enables.



The basic idea of these patches is as follows:

We attempt to grab a lseg (possibly invoking LAYOUTGET) early in the
IO.  If we succeed, we refcount and stash it, using it through the
rest of the io.  If we fail, we revert to straight nfs, even if the
area becomes covered by a layout due to other io.

The tricky, though hopefully anomalous, case is when we start without
the layout, but have it at this particular stage of the IO.  We ignore
this for the moment at write_pages, which will cause block and object
to issue CB_LAYOUTRECALL.  At commit, it is tricky to handle, but
since block doesn't use commit, and file needs to handle complicated
splitting anyway, I just push all complicated decisions of splitting
commit between nfs (for IO started without layout) and pnfs to the
driver.

Fred


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/24] Revert "pnfs-nonfilelayout: Prelim support for non-file layout O_DIRECT"
  2010-06-08  4:18 [PATCH 00/24] LAYOUTGET invocation (rebased) Fred Isaman
@ 2010-06-08  4:18 ` Fred Isaman
  2010-06-08  4:18   ` [PATCH 02/24] Revert "pnfs: Enable O_DIRECT write path." Fred Isaman
  2010-06-09 18:06   ` [PATCH 01/24] Revert "pnfs-nonfilelayout: Prelim support for non-file layout O_DIRECT" Boaz Harrosh
  0 siblings, 2 replies; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:18 UTC (permalink / raw
  To: linux-nfs

This reverts commit 05277f5f5236462a11e7a20ebe9009449f8a463d.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/direct.c |   10 ----------
 1 files changed, 0 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index e111e9f..02e5918 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -191,22 +191,12 @@ static ssize_t nfs_direct_wait(struct nfs_direct_req *dreq)
 {
 	ssize_t result = -EIOCBQUEUED;
 
-	if (!pnfs_use_rpc(NFS_SERVER(dreq->inode))) {
-		/* FIXME: Right now non-rpc layout types must perform
-		 * syncronous direct i/o.
-		 * New pNFS callback to wait on outstanding requests?
-		 */
-		result = 0;
-		goto set_result;
-	}
-
 	/* Async requests don't wait here */
 	if (dreq->iocb)
 		goto out;
 
 	result = wait_for_completion_killable(&dreq->completion);
 
-set_result:
 	if (!result)
 		result = dreq->error;
 	if (!result)
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 02/24] Revert "pnfs: Enable O_DIRECT write path."
  2010-06-08  4:18 ` [PATCH 01/24] Revert "pnfs-nonfilelayout: Prelim support for non-file layout O_DIRECT" Fred Isaman
@ 2010-06-08  4:18   ` Fred Isaman
  2010-06-08  4:19     ` [PATCH 03/24] Revert "pnfs: Enable O_DIRECT read path." Fred Isaman
  2010-06-09 18:06   ` [PATCH 01/24] Revert "pnfs-nonfilelayout: Prelim support for non-file layout O_DIRECT" Boaz Harrosh
  1 sibling, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:18 UTC (permalink / raw
  To: linux-nfs

This reverts commit 2faf680af973895bdfe19f2254b59dc1a153dd82.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/direct.c |   41 +----------------------------------------
 1 files changed, 1 insertions(+), 40 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 02e5918..1148214 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -505,7 +505,6 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq)
 		.workqueue = nfsiod_workqueue,
 		.flags = RPC_TASK_ASYNC,
 	};
-	enum pnfs_try_status trypnfs;
 
 	dreq->count = 0;
 	get_dreq(dreq);
@@ -529,11 +528,6 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq)
 		 * Reuse data->task; data->args should not have changed
 		 * since the original request was sent.
 		 */
-		trypnfs = pnfs_try_to_write_data(data, &nfs_write_direct_ops,
-						 NFS_FILE_SYNC);
-		if (trypnfs == PNFS_ATTEMPTED)
-			continue;
-
 		nfs_direct_write_execute(data, &task_setup_data, &msg);
 	}
 
@@ -616,7 +610,6 @@ static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq)
 		.workqueue = nfsiod_workqueue,
 		.flags = RPC_TASK_ASYNC,
 	};
-	enum pnfs_try_status trypnfs;
 
 	data->inode = dreq->inode;
 	data->cred = msg.rpc_cred;
@@ -630,11 +623,6 @@ static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq)
 	data->res.verf = &data->verf;
 	nfs_fattr_init(&data->fattr);
 
-	trypnfs = pnfs_try_to_commit(data, &nfs_commit_direct_ops,
-				     RPC_TASK_ASYNC);
-	if (trypnfs == PNFS_ATTEMPTED)
-		return;
-
 	nfs_direct_commit_execute(dreq, data, &task_setup_data, &msg);
 }
 
@@ -683,9 +671,6 @@ static void nfs_direct_write_result(struct rpc_task *task, void *calldata)
 {
 	struct nfs_write_data *data = calldata;
 
-	dprintk("%s: verf: %d stable %d\n", __func__,
-		data->res.verf->committed, data->args.stable);
-
 	if (nfs_writeback_done(task, data) != 0)
 		return;
 }
@@ -799,17 +784,6 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
 	unsigned int pgbase;
 	int result;
 	ssize_t started = 0;
-	size_t pnfs_stripe_rem = count;
-	enum pnfs_try_status trypnfs;
-
-	/* pnfs_stripe_rem will be set to the remaining bytes in
-	 * the first stripe_unit (which for standard nfs is count)
-	 */
-	pnfs_direct_init_io(inode, ctx, count, pos, 1,
-			    &wsize, &pnfs_stripe_rem);
-
-	dprintk("%s: pos %llu count %Zu wsize %Zu\n",
-		__func__, pos, count, wsize);
 
 	do {
 		struct nfs_write_data *data;
@@ -818,12 +792,6 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
 		pgbase = user_addr & ~PAGE_MASK;
 		bytes = min(wsize,count);
 
-#if defined(CONFIG_NFS_V4_1)
-		if (pnfs_enabled_sb(NFS_SERVER(inode))) {
-			bytes = min(bytes, pnfs_stripe_rem);
-			pnfs_stripe_rem = wsize;
-		}
-#endif /* CONFIG_NFS_V4_1 */
 		result = -ENOMEM;
 		data = nfs_writedata_alloc(nfs_page_array_len(pgbase, bytes));
 		if (unlikely(!data))
@@ -867,15 +835,8 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
 		data->res.verf = &data->verf;
 		nfs_fattr_init(&data->fattr);
 
-		trypnfs = pnfs_try_to_write_data(data, &nfs_write_direct_ops,
-						 sync);
-		if (trypnfs == PNFS_ATTEMPTED) {
-			result = pnfs_get_write_status(data);
-			if (result)
-				break;
-		} else if (nfs_direct_write_execute(data, &task_setup_data, &msg)) {
+		if (nfs_direct_write_execute(data, &task_setup_data, &msg))
 			break;
-		}
 
 		started += bytes;
 		user_addr += bytes;
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 03/24] Revert "pnfs: Enable O_DIRECT read path."
  2010-06-08  4:18   ` [PATCH 02/24] Revert "pnfs: Enable O_DIRECT write path." Fred Isaman
@ 2010-06-08  4:19     ` Fred Isaman
  2010-06-08  4:19       ` [PATCH 04/24] Revert "pnfs: Add function to set up O_DIRECT I/O" Fred Isaman
  0 siblings, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

This reverts commit fe1dbd120b6a94bbacec205d0a4ae40d36e314b5.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/direct.c |   26 +-------------------------
 1 files changed, 1 insertions(+), 25 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 1148214..3ef9b0c 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -56,7 +56,6 @@
 
 #include "internal.h"
 #include "iostat.h"
-#include "pnfs.h"
 
 #define NFSDBG_FACILITY		NFSDBG_VFS
 
@@ -329,17 +328,6 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq,
 	unsigned int pgbase;
 	int result;
 	ssize_t started = 0;
-	size_t pnfs_stripe_rem = count;
-	enum pnfs_try_status trypnfs;
-
-	/* pnfs_stripe_rem will be set to the remaining bytes in
-	 * the first stripe_unit (which for standard nfs is count)
-	 */
-	pnfs_direct_init_io(inode, ctx, count, pos, 0, &rsize,
-			    &pnfs_stripe_rem);
-
-	dprintk("%s: pos %llu count %Zu wsize %Zu\n",
-		__func__, pos, count, rsize);
 
 	do {
 		struct nfs_read_data *data;
@@ -347,12 +335,6 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq,
 
 		pgbase = user_addr & ~PAGE_MASK;
 		bytes = min(rsize,count);
-#if defined(CONFIG_NFS_V4_1)
-		if (pnfs_enabled_sb(NFS_SERVER(inode))) {
-			bytes = min(bytes, pnfs_stripe_rem);
-			pnfs_stripe_rem = rsize;
-		}
-#endif /* CONFIG_NFS_V4_1 */
 
 		result = -ENOMEM;
 		data = nfs_readdata_alloc(nfs_page_array_len(pgbase, bytes));
@@ -393,14 +375,8 @@ static ssize_t nfs_direct_read_schedule_segment(struct nfs_direct_req *dreq,
 		data->res.eof = 0;
 		data->res.count = bytes;
 
-		trypnfs = pnfs_try_to_read_data(data, &nfs_read_direct_ops);
-		if (trypnfs == PNFS_ATTEMPTED) {
-			result = pnfs_get_read_status(data);
-			if (result)
-				break;
-		} else if (nfs_direct_read_execute(data, &task_setup_data, &msg)) {
+		if (nfs_direct_read_execute(data, &task_setup_data, &msg))
 			break;
-		}
 
 		started += bytes;
 		user_addr += bytes;
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/24] Revert "pnfs: Add function to set up O_DIRECT I/O"
  2010-06-08  4:19     ` [PATCH 03/24] Revert "pnfs: Enable O_DIRECT read path." Fred Isaman
@ 2010-06-08  4:19       ` Fred Isaman
  2010-06-08  4:19         ` [PATCH 05/24] SQUASHME: ensure pnfs_update_lseg clears lsegp on error Fred Isaman
  0 siblings, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

This reverts commit 4bc73cd4118b5d5b710c28c83a750bf4e02e8269.

Conflicts:

	fs/nfs/pnfs.c
	fs/nfs/pnfs.h

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/pnfs.c |   31 -------------------------------
 fs/nfs/pnfs.h |   25 -------------------------
 2 files changed, 0 insertions(+), 56 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 2006926..2f8fa3c 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1396,37 +1396,6 @@ pnfs_pageio_init_write(struct nfs_pageio_descriptor *pgio, struct inode *inode)
 	pnfs_set_pg_test(inode, pgio);
 }
 
-/* Retrieve I/O parameters for O_DIRECT.
- * Out Args:
- * iosize    - min of boundary and (rsize or wsize)
- * remaining - # bytes remaining in the current stripe unit
- */
-void
-_pnfs_direct_init_io(struct inode *inode, struct nfs_open_context *ctx,
-		     size_t count, loff_t loff, int iswrite, size_t *iosize,
-		     size_t *remaining)
-{
-	struct nfs_server *nfss = NFS_SERVER(inode);
-	u32 boundary;
-	unsigned int rwsize;
-
-	if (count <= 0 ||
-	    pnfs_update_layout(inode, ctx, count, loff, IOMODE_READ, NULL))
-		return;
-
-	if (iswrite)
-		rwsize = nfss->wsize;
-	else
-		rwsize = nfss->rsize;
-
-	boundary = pnfs_getboundary(inode);
-
-	*iosize = min(rwsize, boundary);
-	*remaining = boundary - (do_div(loff, boundary));
-
-	dprintk("%s Rem %Zu iosize %Zu\n", __func__, *remaining, *iosize);
-}
-
 /*
  * Get a layoutout for COMMIT
  */
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index a71145e..214d567 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -65,9 +65,6 @@ void pnfs_layout_release(struct pnfs_layout_type *, struct nfs4_pnfs_layout_segm
 void pnfs_set_layout_stateid(struct pnfs_layout_type *lo,
 			     const nfs4_stateid *stateid);
 void pnfs_destroy_layout(struct nfs_inode *);
-void _pnfs_direct_init_io(struct inode *inode, struct nfs_open_context *ctx,
-			  size_t count, loff_t loff, int iswrite,
-			  size_t *rwsize, size_t *remaining);
 
 #define PNFS_EXISTS_LDIO_OP(srv, opname) ((srv)->pnfs_curr_ld &&	\
 				     (srv)->pnfs_curr_ld->ld_io_ops &&	\
@@ -183,20 +180,6 @@ static inline int pnfs_get_read_status(struct nfs_read_data *data)
 	return data->pdata.pnfs_error;
 }
 
-static inline void pnfs_direct_init_io(struct inode *inode,
-				       struct nfs_open_context *ctx,
-				       size_t count, loff_t loff, int iswrite,
-				       size_t *iosize, size_t *remaining)
-{
-	struct nfs_server *nfss = NFS_SERVER(inode);
-
-	if (pnfs_enabled_sb(nfss))
-		return _pnfs_direct_init_io(inode, ctx, count, loff, iswrite,
-					    iosize, remaining);
-
-	return;
-}
-
 static inline int pnfs_use_rpc(struct nfs_server *nfss)
 {
 	if (pnfs_enabled_sb(nfss))
@@ -242,14 +225,6 @@ static inline int pnfs_get_read_status(struct nfs_read_data *data)
 	return 0;
 }
 
-/* Set num of remaining bytes, which is everything */
-static inline void pnfs_direct_init_io(struct inode *inode,
-				       struct nfs_open_context *ctx,
-				       size_t count, loff_t loff, int iswrite,
-				       size_t *iosize, size_t *remaining)
-{
-}
-
 static inline int pnfs_use_rpc(struct nfs_server *nfss)
 {
 	return 1;
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/24] SQUASHME: ensure pnfs_update_lseg clears lsegp on error
  2010-06-08  4:19       ` [PATCH 04/24] Revert "pnfs: Add function to set up O_DIRECT I/O" Fred Isaman
@ 2010-06-08  4:19         ` Fred Isaman
  2010-06-08  4:19           ` [PATCH 06/24] pnfs: filelayout: clean and breakup nfs4_pnfs_dserver_get Fred Isaman
  2010-06-09 18:18           ` [PATCH 05/24] SQUASHME: ensure pnfs_update_lseg clears lsegp on error Boaz Harrosh
  0 siblings, 2 replies; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

This should be squashed into my (or alexandros's)submission patches for
version 2. Compensate for Alexandros returning error but assigning lseg.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/pnfs.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 2f8fa3c..b990471 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1064,6 +1064,8 @@ pnfs_update_layout(struct inode *ino,
 	DEFINE_WAIT(__wait);
 	int result = 0;
 
+	if (take_ref)
+		*lsegpp = NULL;
 	lo = get_lock_alloc_layout(ino);
 	if (IS_ERR(lo)) {
 		dprintk("%s ERROR: can't get pnfs_layout_type\n", __func__);
@@ -1078,6 +1080,7 @@ pnfs_update_layout(struct inode *ino,
 			put_lseg(lseg);
 
 		/* someone is cleaning the layout */
+		lseg = NULL;
 		result = -EAGAIN;
 		goto out_put;
 	}
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/24] pnfs: filelayout: clean and breakup nfs4_pnfs_dserver_get
  2010-06-08  4:19         ` [PATCH 05/24] SQUASHME: ensure pnfs_update_lseg clears lsegp on error Fred Isaman
@ 2010-06-08  4:19           ` Fred Isaman
  2010-06-08  4:19             ` [PATCH 07/24] pnfs: filelayout: remove some dead code from filelayout_commit Fred Isaman
  2010-06-09 18:18           ` [PATCH 05/24] SQUASHME: ensure pnfs_update_lseg clears lsegp on error Boaz Harrosh
  1 sibling, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

Rewrite nfs4_pnfs_dserver_get as two functions, nfs4_fl_calc_ds_index() and
nfs4_fl_prepare_ds().  This cleans up the code a bit and prepares for more
extensive rewrite of filelayout_commit().

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/nfs4filelayout.c    |   75 ++++++++++++----------------------
 fs/nfs/nfs4filelayout.h    |   33 +++++++--------
 fs/nfs/nfs4filelayoutdev.c |   95 +++++++++++++++----------------------------
 3 files changed, 75 insertions(+), 128 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index b0cda5d..2ffca74 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -196,8 +196,8 @@ filelayout_read_pagelist(struct pnfs_layout_type *layoutid,
 {
 	struct inode *inode = PNFS_INODE(layoutid);
 	struct nfs4_filelayout_segment *flseg;
-	struct nfs4_pnfs_dserver dserver;
-	int status;
+	struct nfs4_pnfs_ds *ds;
+	u32 idx;
 
 	dprintk("--> %s ino %lu nr_pages %d pgbase %u req %Zu@%llu\n",
 		__func__, inode->i_ino, nr_pages, pgbase, count, offset);
@@ -205,23 +205,19 @@ filelayout_read_pagelist(struct pnfs_layout_type *layoutid,
 	flseg = LSEG_LD_DATA(data->pdata.lseg);
 
 	/* Retrieve the correct rpc_client for the byte range */
-	status = nfs4_pnfs_dserver_get(data->pdata.lseg,
-				       offset,
-				       count,
-				       &dserver);
-	if (status) {
-		printk(KERN_ERR "%s: dserver get failed status %d use MDS\n",
-		       __func__, status);
+	idx = nfs4_fl_calc_ds_index(data->pdata.lseg, offset);
+	ds = nfs4_fl_prepare_ds(data->pdata.lseg, idx);
+	if (!ds) {
+		printk(KERN_ERR "%s: prepare_ds failed, use MDS\n", __func__);
 		return PNFS_NOT_ATTEMPTED;
 	}
-
 	dprintk("%s USE DS:ip %x %s\n", __func__,
-		htonl(dserver.ds->ds_ip_addr), dserver.ds->r_addr);
+		htonl(ds->ds_ip_addr), ds->r_addr);
 
 	/* just try the first data server for the index..*/
-	data->fldata.pnfs_client = dserver.ds->ds_clp->cl_rpcclient;
-	data->fldata.ds_nfs_client = dserver.ds->ds_clp;
-	data->args.fh = dserver.fh;
+	data->fldata.ds_nfs_client = ds->ds_clp;
+	data->fldata.pnfs_client = ds->ds_clp->cl_rpcclient;
+	data->args.fh = nfs4_fl_select_ds_fh(flseg, idx);
 
 	/* Now get the file offset on the dserver
 	 * Set the read offset to this offset, and
@@ -255,32 +251,26 @@ filelayout_write_pagelist(struct pnfs_layout_type *layoutid,
 {
 	struct inode *inode = PNFS_INODE(layoutid);
 	struct nfs4_filelayout_segment *flseg = LSEG_LD_DATA(data->pdata.lseg);
-	struct nfs4_pnfs_dserver dserver;
-	int status;
+	struct nfs4_pnfs_ds *ds;
+	u32 idx;
 
 	dprintk("--> %s ino %lu nr_pages %d pgbase %u req %Zu@%llu sync %d\n",
 		__func__, inode->i_ino, nr_pages, pgbase, count, offset, sync);
 
 	/* Retrieve the correct rpc_client for the byte range */
-	status = nfs4_pnfs_dserver_get(data->pdata.lseg,
-				       offset,
-				       count,
-				       &dserver);
-
-	if (status) {
-		printk(KERN_ERR "%s: dserver get failed status %d use MDS\n",
-		       __func__, status);
+	idx = nfs4_fl_calc_ds_index(data->pdata.lseg, offset);
+	ds = nfs4_fl_prepare_ds(data->pdata.lseg, idx);
+	if (!ds) {
+		printk(KERN_ERR "%s: prepare_ds failed, use MDS\n", __func__);
 		return PNFS_NOT_ATTEMPTED;
 	}
-
 	dprintk("%s ino %lu %Zu@%llu DS:%x:%hu %s\n",
 		__func__, inode->i_ino, count, offset,
-		htonl(dserver.ds->ds_ip_addr), ntohs(dserver.ds->ds_port),
-		dserver.ds->r_addr);
+		htonl(ds->ds_ip_addr), ntohs(ds->ds_port), ds->r_addr);
 
-	data->fldata.pnfs_client = dserver.ds->ds_clp->cl_rpcclient;
-	data->fldata.ds_nfs_client = dserver.ds->ds_clp;
-	data->args.fh = dserver.fh;
+	data->fldata.ds_nfs_client = ds->ds_clp;
+	data->fldata.pnfs_client = ds->ds_clp->cl_rpcclient;
+	data->args.fh = nfs4_fl_select_ds_fh(flseg, idx);
 
 	/* Get the file offset on the dserver. Set the write offset to
 	 * this offset and save the original offset.
@@ -568,15 +558,12 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
 {
 	struct nfs4_filelayout_segment *nfslay;
 	struct nfs_write_data   *dsdata = NULL;
-	struct nfs4_pnfs_dserver dserver;
 	struct nfs4_pnfs_ds *ds;
 	struct nfs_page *req, *reqt;
 	struct list_head *pos, *tmp, head, head2;
 	loff_t file_offset, comp_offset;
 	size_t stripesz, cbytes;
-	int status;
 	enum pnfs_try_status trypnfs = PNFS_ATTEMPTED;
-	struct nfs4_file_layout_dsaddr *dsaddr;
 	u32 idx1, idx2;
 
 	nfslay = LSEG_LD_DATA(data->pdata.lseg);
@@ -593,9 +580,6 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
 	stripesz = filelayout_get_stripesize(layoutid);
 	dprintk("%s stripesize %Zd\n", __func__, stripesz);
 
-	dsaddr = container_of(data->pdata.lseg->deviceid,
-			      struct nfs4_file_layout_dsaddr, deviceid);
-
 	INIT_LIST_HEAD(&head);
 	INIT_LIST_HEAD(&head2);
 	list_add(&head, &data->pages);
@@ -609,19 +593,13 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
 		file_offset = (loff_t)req->wb_index << PAGE_CACHE_SHIFT;
 
 		/* Get dserver for the current page */
-		status = nfs4_pnfs_dserver_get(data->pdata.lseg,
-					       file_offset,
-					       req->wb_bytes,
-					       &dserver);
-		if (status) {
+		idx1 = nfs4_fl_calc_ds_index(data->pdata.lseg, file_offset);
+		ds = nfs4_fl_prepare_ds(data->pdata.lseg, idx1);
+		if (!ds) {
 			data->pdata.pnfs_error = -EIO;
 			goto err_rewind;
 		}
 
-		/* Get its index */
-		idx1 = filelayout_dserver_get_index(file_offset, dsaddr,
-						    nfslay);
-
 		/* Gather all pages going to the current data server by
 		 * comparing their indices.
 		 * XXX: This recalculates the indices unecessarily.
@@ -630,8 +608,8 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
 		list_for_each_safe(pos, tmp, &head) {
 			reqt = nfs_list_entry(pos);
 			comp_offset = (loff_t)reqt->wb_index << PAGE_CACHE_SHIFT;
-			idx2 = filelayout_dserver_get_index(comp_offset,
-							    dsaddr, nfslay);
+			idx2 = nfs4_fl_calc_ds_index(data->pdata.lseg,
+						     comp_offset);
 			if (idx1 == idx2) {
 				nfs_list_remove_request(reqt);
 				nfs_list_add_request(reqt, &head2);
@@ -655,10 +633,9 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
 		list_add(&dsdata->pages, &head2);
 		list_del_init(&head2);
 
-		ds = dserver.ds;
 		dsdata->fldata.pnfs_client = ds->ds_clp->cl_rpcclient;
 		dsdata->fldata.ds_nfs_client = ds->ds_clp;
-		dsdata->args.fh = dserver.fh;
+		dsdata->args.fh = nfs4_fl_select_ds_fh(nfslay, idx1);
 
 		dprintk("%s: Initiating commit: %Zu@%llu USE DS:\n",
 			__func__, cbytes, file_offset);
diff --git a/fs/nfs/nfs4filelayout.h b/fs/nfs/nfs4filelayout.h
index fbf307c..3697926 100644
--- a/fs/nfs/nfs4filelayout.h
+++ b/fs/nfs/nfs4filelayout.h
@@ -26,6 +26,9 @@
 
 #define FILE_MT(inode) ((struct filelayout_mount_type *) \
 			(NFS_SERVER(inode)->pnfs_mountid->mountid))
+#define FILE_DSADDR(lseg) (container_of(lseg->deviceid, \
+					struct nfs4_file_layout_dsaddr, \
+					deviceid))
 
 enum stripetype4 {
 	STRIPE_SPARSE = 1,
@@ -55,16 +58,6 @@ struct nfs4_pnfs_dev_hlist {
 	struct hlist_head	dev_list[NFS4_PNFS_DEV_HASH_SIZE];
 };
 
-/*
- * Used for I/O, Maps a stripe index to a layout file handle and a
- * multipath data server.
- */
-
-struct nfs4_pnfs_dserver {
-	struct nfs_fh        *fh;
-	struct nfs4_pnfs_ds	*ds;
-};
-
 struct nfs4_filelayout_segment {
 	u32 stripe_type;
 	u32 commit_through_mds;
@@ -87,18 +80,24 @@ struct filelayout_mount_type {
 	struct super_block *fl_sb;
 };
 
+static inline struct nfs_fh *
+nfs4_fl_select_ds_fh(struct nfs4_filelayout_segment *flseg, u32 idx)
+{
+	/* FRED - what about case == 0??? */
+	if (flseg->num_fh == 1)
+		return &flseg->fh_array[0];
+	else
+		return &flseg->fh_array[idx];
+}
+
 extern struct pnfs_client_operations *pnfs_callback_ops;
 
 extern void nfs4_fl_free_deviceid_callback(struct kref *);
 extern void print_ds(struct nfs4_pnfs_ds *ds);
 char *deviceid_fmt(const struct pnfs_deviceid *dev_id);
-int nfs4_pnfs_dserver_get(struct pnfs_layout_segment *lseg,
-			  loff_t offset,
-			  size_t count,
-			  struct nfs4_pnfs_dserver *dserver);
-u32 filelayout_dserver_get_index(loff_t offset,
-				 struct nfs4_file_layout_dsaddr *di,
-				 struct nfs4_filelayout_segment *layout);
+u32 nfs4_fl_calc_ds_index(struct pnfs_layout_segment *lseg, loff_t offset);
+struct nfs4_pnfs_ds *nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg,
+					u32 ds_idx);
 extern struct nfs4_file_layout_dsaddr *
 nfs4_pnfs_device_item_find(struct nfs_client *, struct pnfs_deviceid *dev_id);
 struct nfs4_file_layout_dsaddr *
diff --git a/fs/nfs/nfs4filelayoutdev.c b/fs/nfs/nfs4filelayoutdev.c
index f5eb5f1..404dd5f 100644
--- a/fs/nfs/nfs4filelayoutdev.c
+++ b/fs/nfs/nfs4filelayoutdev.c
@@ -554,90 +554,61 @@ nfs4_pnfs_device_item_find(struct nfs_client *clp, struct pnfs_deviceid *id)
 		container_of(d, struct nfs4_file_layout_dsaddr, deviceid);
 }
 
-/* Want res = ((offset / layout->stripe_unit) % dsaddr->stripe_count)
+/* Want res = (offset - layout->pattern_offset)/ layout->stripe_unit
  * Then: ((res + fsi) % dsaddr->stripe_count)
  */
-u32
-filelayout_dserver_get_index(loff_t offset,
-			     struct nfs4_file_layout_dsaddr *dsaddr,
-			     struct nfs4_filelayout_segment *layout)
+static inline u32
+_nfs4_fl_calc_j_index(loff_t offset,
+		      struct nfs4_file_layout_dsaddr *dsaddr,
+		      struct nfs4_filelayout_segment *layout)
 {
-	u64 tmp, tmp2;
+	u64 tmp;
 
-	tmp = offset;
+	tmp = offset - layout->pattern_offset;
 	do_div(tmp, layout->stripe_unit);
-	tmp2 = do_div(tmp, dsaddr->stripe_count) + layout->first_stripe_index;
-	return do_div(tmp2, dsaddr->stripe_count);
+	tmp += layout->first_stripe_index;
+	return do_div(tmp, dsaddr->stripe_count);
 }
 
-/* Retrieve the rpc client for a specified byte range
- * in 'inode' by filling in the contents of 'dserver'.
- */
-int
-nfs4_pnfs_dserver_get(struct pnfs_layout_segment *lseg,
-		      loff_t offset,
-		      size_t count,
-		      struct nfs4_pnfs_dserver *dserver)
+u32
+nfs4_fl_calc_ds_index(struct pnfs_layout_segment *lseg, loff_t offset)
 {
-	struct nfs4_filelayout_segment *layout = LSEG_LD_DATA(lseg);
-	struct inode *inode = PNFS_INODE(lseg->layout);
-	struct nfs_server *mds_srv = NFS_SERVER(inode);
+	struct nfs4_filelayout_segment *flseg = LSEG_LD_DATA(lseg);
 	struct nfs4_file_layout_dsaddr *dsaddr;
-	u64 tmp, tmp2;
-	u32 stripe_idx, end_idx, ds_idx;
-
-	if (!layout)
-		return 1;
-
-	dsaddr = container_of(lseg->deviceid, struct nfs4_file_layout_dsaddr,
-			      deviceid);
-
-	stripe_idx = filelayout_dserver_get_index(offset, dsaddr, layout);
-
-	/* For debugging, ensure entire requested range is in this dserver */
-	tmp = offset + count - 1;
-	do_div(tmp, layout->stripe_unit);
-	tmp2 = do_div(tmp, dsaddr->stripe_count) + layout->first_stripe_index;
-	end_idx = do_div(tmp2, dsaddr->stripe_count);
+	u32 j;
 
-	dprintk("%s: offset=%Lu, count=%Zu, si=%u, dsi=%u, "
-		"stripe_count=%u, stripe_unit=%u first_stripe_index %u\n",
-		__func__,
-		offset, count, stripe_idx, end_idx, dsaddr->stripe_count,
-		layout->stripe_unit, layout->first_stripe_index);
+	dsaddr = FILE_DSADDR(lseg);
+	j = _nfs4_fl_calc_j_index(offset, dsaddr, flseg);
+	return dsaddr->stripe_indices[j];
+}
 
-	BUG_ON(end_idx != stripe_idx);
-	BUG_ON(stripe_idx >= dsaddr->stripe_count);
+struct nfs4_pnfs_ds *
+nfs4_fl_prepare_ds(struct pnfs_layout_segment *lseg, u32 ds_idx)
+{
+	struct nfs4_filelayout_segment *flseg = LSEG_LD_DATA(lseg);
+	struct nfs4_file_layout_dsaddr *dsaddr;
 
-	ds_idx = dsaddr->stripe_indices[stripe_idx];
+	dsaddr = FILE_DSADDR(lseg);
 	if (dsaddr->ds_list[ds_idx] == NULL) {
-		printk(KERN_ERR "%s: No data server for device id (%s)!! \n",
-			__func__, deviceid_fmt(&layout->dev_id));
-		return 1;
+		printk(KERN_ERR "%s: No data server for device id (%s)!!\n",
+			__func__, deviceid_fmt(&flseg->dev_id));
+		return NULL;
 	}
 
 	if (!dsaddr->ds_list[ds_idx]->ds_clp) {
 		int err;
 
-		err = nfs4_pnfs_ds_create(mds_srv, dsaddr->ds_list[ds_idx]);
+		err = nfs4_pnfs_ds_create(PNFS_NFS_SERVER(lseg->layout),
+					  dsaddr->ds_list[ds_idx]);
 		if (err) {
 			printk(KERN_ERR "%s nfs4_pnfs_ds_create error %d\n",
 			       __func__, err);
-			return 1;
+			return NULL;
 		}
 	}
-	dserver->ds = dsaddr->ds_list[ds_idx];
+	dprintk("%s: dev_id=%s, ds_idx=%u\n",
+		__func__, deviceid_fmt(&flseg->dev_id), ds_idx);
 
-	if (layout->num_fh == 1)
-		dserver->fh = &layout->fh_array[0];
-	else
-		dserver->fh = &layout->fh_array[ds_idx];
-
-	dprintk("%s: dev_id=%s, ip:port=%s, ds_idx=%u stripe_idx=%u, "
-		"offset=%llu, count=%Zu\n",
-		__func__, deviceid_fmt(&layout->dev_id),
-		dserver->ds->r_addr,
-		ds_idx, stripe_idx, offset, count);
-
-	return 0;
+	return dsaddr->ds_list[ds_idx];
 }
+
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/24] pnfs: filelayout: remove some dead code from filelayout_commit
  2010-06-08  4:19           ` [PATCH 06/24] pnfs: filelayout: clean and breakup nfs4_pnfs_dserver_get Fred Isaman
@ 2010-06-08  4:19             ` Fred Isaman
  2010-06-08  4:19               ` [PATCH 08/24] pnfs: remove PNFS_LAYOUTGET_ON_OPEN Fred Isaman
  0 siblings, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/nfs4filelayout.c |   10 ++--------
 1 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 2ffca74..c15b90a 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -562,7 +562,6 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
 	struct nfs_page *req, *reqt;
 	struct list_head *pos, *tmp, head, head2;
 	loff_t file_offset, comp_offset;
-	size_t stripesz, cbytes;
 	enum pnfs_try_status trypnfs = PNFS_ATTEMPTED;
 	u32 idx1, idx2;
 
@@ -577,9 +576,6 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
 		return PNFS_NOT_ATTEMPTED;
 	}
 
-	stripesz = filelayout_get_stripesize(layoutid);
-	dprintk("%s stripesize %Zd\n", __func__, stripesz);
-
 	INIT_LIST_HEAD(&head);
 	INIT_LIST_HEAD(&head2);
 	list_add(&head, &data->pages);
@@ -587,7 +583,6 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
 
 	/* COMMIT to each Data Server */
 	while (!list_empty(&head)) {
-		cbytes = 0;
 		req = nfs_list_entry(head.next);
 
 		file_offset = (loff_t)req->wb_index << PAGE_CACHE_SHIFT;
@@ -613,7 +608,6 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
 			if (idx1 == idx2) {
 				nfs_list_remove_request(reqt);
 				nfs_list_add_request(reqt, &head2);
-				cbytes += reqt->wb_bytes;
 			}
 		}
 
@@ -637,8 +631,8 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
 		dsdata->fldata.ds_nfs_client = ds->ds_clp;
 		dsdata->args.fh = nfs4_fl_select_ds_fh(nfslay, idx1);
 
-		dprintk("%s: Initiating commit: %Zu@%llu USE DS:\n",
-			__func__, cbytes, file_offset);
+		dprintk("%s: Initiating commit: %llu USE DS:\n",
+			__func__, file_offset);
 		print_ds(ds);
 
 		/* Send COMMIT to data server */
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/24] pnfs: remove PNFS_LAYOUTGET_ON_OPEN
  2010-06-08  4:19             ` [PATCH 07/24] pnfs: filelayout: remove some dead code from filelayout_commit Fred Isaman
@ 2010-06-08  4:19               ` Fred Isaman
  2010-06-08  4:19                 ` [PATCH 09/24] pnfs: track the number of outstanding commits Fred Isaman
  0 siblings, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

It is not used anywhere.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/nfs4filelayout.c   |    3 +--
 include/linux/nfs4_pnfs.h |   14 --------------
 2 files changed, 1 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index c15b90a..7fc93e6 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -710,8 +710,7 @@ struct layoutdriver_io_operations filelayout_io_operations = {
 };
 
 struct layoutdriver_policy_operations filelayout_policy_operations = {
-	.flags                 = PNFS_USE_RPC_CODE |
-	                         PNFS_LAYOUTGET_ON_OPEN,
+	.flags                 = PNFS_USE_RPC_CODE,
 	.get_stripesize        = filelayout_get_stripesize,
 	.pg_test               = filelayout_pg_test,
 };
diff --git a/include/linux/nfs4_pnfs.h b/include/linux/nfs4_pnfs.h
index 53626d4..80acd7a 100644
--- a/include/linux/nfs4_pnfs.h
+++ b/include/linux/nfs4_pnfs.h
@@ -171,11 +171,6 @@ enum layoutdriver_policy_flags {
 	/* Should the NFS req. gather algorithm cross stripe boundaries? */
 	PNFS_GATHER_ACROSS_STRIPES	= 1 << 1,
 
-	/* Should the pNFS client issue a layoutget call in the
-	 * same compound as the OPEN operation?
-	 */
-	PNFS_LAYOUTGET_ON_OPEN		= 1 << 2,
-
 	/* Should the pNFS client commit and return the layout upon a setattr */
 	PNFS_LAYOUTRET_ON_SETATTR	= 1 << 3,
 };
@@ -204,15 +199,6 @@ pnfs_ld_gather_across_stripes(struct pnfs_layoutdriver_type *ld)
 	return ld->ld_policy_ops->flags & PNFS_GATHER_ACROSS_STRIPES;
 }
 
-/* Should the pNFS client issue a layoutget call in the
- * same compound as the OPEN operation?
- */
-static inline int
-pnfs_ld_layoutget_on_open(struct pnfs_layoutdriver_type *ld)
-{
-	return ld->ld_policy_ops->flags & PNFS_LAYOUTGET_ON_OPEN;
-}
-
 /* Should the pNFS client commit and return the layout upon a setattr
  */
 static inline int
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/24] pnfs: track the number of outstanding commits
  2010-06-08  4:19               ` [PATCH 08/24] pnfs: remove PNFS_LAYOUTGET_ON_OPEN Fred Isaman
@ 2010-06-08  4:19                 ` Fred Isaman
  2010-06-08  4:19                   ` [PATCH 10/24] pnfs_submit: mandate basic io path operations for layout drivers Fred Isaman
  0 siblings, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

Commit 71d0a6112a3 "NFS: Fix an unstable write data integrity race"
adds locking which is incompatible with the current file layout commit code,
which splits the commit into several RPCs cloned from the original.
Add a counter so layout driver can properly unlock only once.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/nfs4filelayout.c |    3 +++
 fs/nfs/write.c          |   19 ++++++++++++++++---
 include/linux/nfs_xdr.h |    2 ++
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 7fc93e6..e36c95d 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -518,6 +518,9 @@ filelayout_clone_write_data(struct nfs_write_data *old)
 	new = nfs_commitdata_alloc();
 	if (!new)
 		goto out;
+	kref_init(&new->refcount);
+	new->parent      = old;
+	kref_get(&old->refcount);
 	new->inode       = old->inode;
 	new->cred        = old->cred;
 	new->args.offset = 0;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index e42cd2b..13319c8 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1369,7 +1369,8 @@ static int nfs_commit_rpcsetup(struct list_head *head,
 	data->res.fattr   = &data->fattr;
 	data->res.verf    = &data->verf;
 	nfs_fattr_init(&data->fattr);
-
+	kref_init(&data->refcount);
+	data->parent      = NULL;
 	data->args.context = first->wb_context;  /* used by commit done */
 
 	return pnfs_initiate_commit(data, NFS_CLIENT(inode), &nfs_commit_ops,
@@ -1421,6 +1422,19 @@ static void nfs_commit_done(struct rpc_task *task, void *calldata)
 		return;
 }
 
+static inline void nfs_commit_cleanup(struct kref *kref)
+{
+	struct nfs_write_data *data;
+
+	data = container_of(kref, struct nfs_write_data, refcount);
+	/* Clear lock only when all cloned commits are finished */
+	if (data->parent)
+		kref_put(&data->parent->refcount, nfs_commit_cleanup);
+	else
+		nfs_commit_clear_lock(NFS_I(data->inode));
+	nfs_commitdata_release(data);
+}
+
 static void nfs_commit_release(void *calldata)
 {
 	struct nfs_write_data	*data = calldata;
@@ -1458,8 +1472,7 @@ static void nfs_commit_release(void *calldata)
 	next:
 		nfs_clear_page_tag_locked(req);
 	}
-	nfs_commit_clear_lock(NFS_I(data->inode));
-	nfs_commitdata_release(calldata);
+	kref_put(&data->refcount, nfs_commit_cleanup);
 }
 
 static const struct rpc_call_ops nfs_commit_ops = {
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 634a199..07d6dd2 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1010,6 +1010,8 @@ struct nfs_read_data {
 };
 
 struct nfs_write_data {
+	struct kref		refcount;	/* For pnfs commit splitting */
+	struct nfs_write_data	*parent;	/* For pnfs commit splitting */
 	int			flags;
 	struct rpc_task		task;
 	struct inode		*inode;
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/24] pnfs_submit: mandate basic io path operations for layout drivers
  2010-06-08  4:19                 ` [PATCH 09/24] pnfs: track the number of outstanding commits Fred Isaman
@ 2010-06-08  4:19                   ` Fred Isaman
  2010-06-08  4:19                     ` [PATCH 11/24] pnfs_submit: expose pnfs_update_layout, put_lseg, and get_lseg functions Fred Isaman
  0 siblings, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

Mandate read_pagelist, write_pagelist, and commit.  This will help
void needless checks in the io path.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/pnfs.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index b990471..836cb0f 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -272,6 +272,14 @@ pnfs_register_layoutdriver(struct pnfs_layoutdriver_type *ld_type)
 		return NULL;
 	}
 
+	if (!io_ops->read_pagelist || !io_ops->write_pagelist ||
+	    !io_ops->commit) {
+		printk(KERN_ERR "%s Layout driver must provide "
+		       "read_pagelist, write_pagelist, and commit.\n",
+		       __func__);
+		return NULL;
+	}
+
 	pnfs_mod = kmalloc(sizeof(struct pnfs_module), GFP_KERNEL);
 	if (pnfs_mod != NULL) {
 		dprintk("%s Registering id:%u name:%s\n",
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 11/24] pnfs_submit: expose pnfs_update_layout, put_lseg, and get_lseg functions
  2010-06-08  4:19                   ` [PATCH 10/24] pnfs_submit: mandate basic io path operations for layout drivers Fred Isaman
@ 2010-06-08  4:19                     ` Fred Isaman
  2010-06-08  4:19                       ` [PATCH 12/24] pnfs_submit: stash and refcount lseg in read path Fred Isaman
  2010-06-09 18:58                       ` [PATCH 11/24] pnfs_submit: expose pnfs_update_layout, put_lseg, and get_lseg functions Boaz Harrosh
  0 siblings, 2 replies; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

These will be used in the generic code.  Set so they will compile away to
nothing if CONFIG_NFS_V4_1 not set.

This requires kref_put to be under lock.  See rule 3 of Documentation/kref.txt

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/pnfs.c |   45 ++++++++++++++++++++++++++++++++-------------
 fs/nfs/pnfs.h |   44 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 75 insertions(+), 14 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 836cb0f..a74a4b6 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -436,7 +436,25 @@ destroy_lseg(struct kref *kref)
 	PNFS_LD_IO_OPS(lseg->layout)->free_lseg(lseg);
 }
 
-static inline void
+static void
+put_lseg_locked(struct pnfs_layout_segment *lseg)
+{
+	bool do_wake_up;
+	struct nfs_inode *nfsi;
+
+	if (!lseg)
+		return;
+
+	dprintk("%s: lseg %p ref %d valid %d\n", __func__, lseg,
+		atomic_read(&lseg->kref.refcount), lseg->valid);
+	do_wake_up = !lseg->valid;
+	nfsi = PNFS_NFS_INODE(lseg->layout);
+	kref_put(&lseg->kref, destroy_lseg);
+	if (do_wake_up)
+		wake_up(&nfsi->lo_waitq);
+}
+
+void
 put_lseg(struct pnfs_layout_segment *lseg)
 {
 	bool do_wake_up;
@@ -449,7 +467,9 @@ put_lseg(struct pnfs_layout_segment *lseg)
 		atomic_read(&lseg->kref.refcount), lseg->valid);
 	do_wake_up = !lseg->valid;
 	nfsi = PNFS_NFS_INODE(lseg->layout);
+	lock_current_layout(nfsi);
 	kref_put(&lseg->kref, destroy_lseg);
+	unlock_current_layout(nfsi);
 	if (do_wake_up)
 		wake_up(&nfsi->lo_waitq);
 }
@@ -674,7 +694,7 @@ pnfs_free_layout(struct pnfs_layout_type *lo,
 			lseg, lseg->range.iomode, lseg->range.offset,
 			lseg->range.length);
 		list_del(&lseg->fi_list);
-		put_lseg(lseg);
+		put_lseg_locked(lseg);
 	}
 
 	dprintk("%s:Return\n", __func__);
@@ -1033,7 +1053,7 @@ pnfs_has_layout(struct pnfs_layout_type *lo,
 		    (lseg->valid || !only_valid)) {
 			ret = lseg;
 			if (take_ref)
-				kref_get(&ret->kref);
+				get_lseg(ret);
 			break;
 		}
 		if (cmp_layout(range, &lseg->range) > 0)
@@ -1053,7 +1073,7 @@ pnfs_has_layout(struct pnfs_layout_type *lo,
  * returned to the caller.
  */
 int
-pnfs_update_layout(struct inode *ino,
+_pnfs_update_layout(struct inode *ino,
 		   struct nfs_open_context *ctx,
 		   u64 count,
 		   loff_t pos,
@@ -1085,8 +1105,7 @@ pnfs_update_layout(struct inode *ino,
 	lseg = pnfs_has_layout(lo, &arg, take_ref, !take_ref);
 	if (lseg && !lseg->valid) {
 		if (take_ref)
-			put_lseg(lseg);
-
+			put_lseg_locked(lseg);
 		/* someone is cleaning the layout */
 		lseg = NULL;
 		result = -EAGAIN;
@@ -1262,7 +1281,7 @@ pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp)
 	init_lseg(lo, lseg);
 	lseg->range = res->lseg;
 	if (lgp->lsegpp) {
-		kref_get(&lseg->kref);
+		get_lseg(lseg);
 		*lgp->lsegpp = lseg;
 	}
 
@@ -1380,7 +1399,7 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
 	readahead_range(inode, pages, &loff, &count);
 
 	if (count > 0) {
-		status = pnfs_update_layout(inode, ctx, count,
+		status = _pnfs_update_layout(inode, ctx, count,
 						loff, IOMODE_READ, NULL);
 		dprintk("%s virt update returned %d\n", __func__, status);
 		if (status != 0)
@@ -1438,7 +1457,7 @@ pnfs_update_layout_commit(struct inode *inode,
 	if (start == 0 && count == 0)
 		count = NFS4_MAX_UINT64;
 
-	status = pnfs_update_layout(inode, nfs_page->wb_context,
+	status = _pnfs_update_layout(inode, nfs_page->wb_context,
 				count,
 				start,
 				IOMODE_RW,
@@ -1538,7 +1557,7 @@ pnfs_file_write(struct file *filp, const char __user *buf, size_t count,
 		goto out;
 
 	/* Retrieve and set layout if not allready cached */
-	status = pnfs_update_layout(inode,
+	status = _pnfs_update_layout(inode,
 				    context,
 				    count,
 				    *pos,
@@ -1580,7 +1599,7 @@ pnfs_writepages(struct nfs_write_data *wdata, int how)
 		args->offset);
 
 	/* Retrieve and set layout if not allready cached */
-	status = pnfs_update_layout(inode,
+	status = _pnfs_update_layout(inode,
 				    args->context,
 				    args->count,
 				    args->offset,
@@ -1681,7 +1700,7 @@ pnfs_readpages(struct nfs_read_data *rdata)
 		args->offset);
 
 	/* Retrieve and set layout if not allready cached */
-	status = pnfs_update_layout(inode,
+	status = _pnfs_update_layout(inode,
 				    args->context,
 				    args->count,
 				    args->offset,
@@ -1845,7 +1864,7 @@ pnfs_commit(struct nfs_write_data *data, int sync)
 	   new one.  If it was recalled we better commit the data first
 	   before returning it, otherwise the data needs to be rewritten,
 	   either with a new layout or to the MDS */
-	result = pnfs_update_layout(data->inode,
+	result = _pnfs_update_layout(data->inode,
 				    NULL,
 				    count,
 				    first->wb_offset,
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 214d567..6326ed5 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -31,7 +31,8 @@ extern int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp, bool wait
 /* pnfs.c */
 extern const nfs4_stateid zero_stateid;
 
-int pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
+void put_lseg(struct pnfs_layout_segment *lseg);
+int _pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
 	u64 count, loff_t pos, enum pnfs_iomode access_type,
 	struct pnfs_layout_segment **lsegpp);
 
@@ -81,6 +82,12 @@ static inline int lo_fail_bit(u32 iomode)
 			 NFS_INO_RW_LAYOUT_FAILED : NFS_INO_RO_LAYOUT_FAILED;
 }
 
+static inline void get_lseg(struct pnfs_layout_segment *lseg)
+{
+	if (lseg)
+		kref_get(&lseg->kref);
+}
+
 /* Return true if a layout driver is being used for this mountpoint */
 static inline int pnfs_enabled_sb(struct nfs_server *nfss)
 {
@@ -170,6 +177,23 @@ static inline int pnfs_return_layout(struct inode *ino,
 	return 0;
 }
 
+static inline int pnfs_update_layout(struct inode *ino,
+	struct nfs_open_context *ctx,
+	u64 count, loff_t pos, enum pnfs_iomode access_type,
+	struct pnfs_layout_segment **lsegpp)
+{
+	struct nfs_server *nfss = NFS_SERVER(ino);
+
+	if (pnfs_enabled_sb(nfss))
+		return _pnfs_update_layout(ino, ctx, count, pos,
+					   access_type, lsegpp);
+	else {
+		if (lsegpp)
+			*lsegpp = NULL;
+		return 0;
+	}
+}
+
 static inline int pnfs_get_write_status(struct nfs_write_data *data)
 {
 	return data->pdata.pnfs_error;
@@ -190,6 +214,24 @@ static inline int pnfs_use_rpc(struct nfs_server *nfss)
 
 #else  /* CONFIG_NFS_V4_1 */
 
+static inline void get_lseg(struct pnfs_layout_segment *lseg)
+{
+}
+
+static inline void put_lseg(struct pnfs_layout_segment *lseg)
+{
+}
+
+static inline int
+pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
+	u64 count, loff_t pos, enum pnfs_iomode access_type,
+	struct pnfs_layout_segment **lsegpp)
+{
+	if (lsegpp)
+		*lsegpp = NULL;
+	return 0;
+}
+
 static inline enum pnfs_try_status
 pnfs_try_to_read_data(struct nfs_read_data *data,
 		      const struct rpc_call_ops *call_ops)
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 12/24] pnfs_submit: stash and refcount lseg in read path
  2010-06-08  4:19                     ` [PATCH 11/24] pnfs_submit: expose pnfs_update_layout, put_lseg, and get_lseg functions Fred Isaman
@ 2010-06-08  4:19                       ` Fred Isaman
  2010-06-08  4:19                         ` [PATCH 13/24] pnfs_submit: read path changeover Fred Isaman
  2010-06-09 18:58                       ` [PATCH 11/24] pnfs_submit: expose pnfs_update_layout, put_lseg, and get_lseg functions Boaz Harrosh
  1 sibling, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

Note we are not using it yet, but refcounting should be accurate.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/pagelist.c        |   11 +++++++++--
 fs/nfs/pnfs.c            |    4 +++-
 fs/nfs/read.c            |    9 +++++++--
 fs/nfs/write.c           |    2 +-
 include/linux/nfs_page.h |    5 ++++-
 5 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 8314915..ed647b9 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -20,6 +20,7 @@
 #include <linux/nfs_mount.h>
 
 #include "internal.h"
+#include "pnfs.h"
 
 static struct kmem_cache *nfs_page_cachep;
 
@@ -56,7 +57,8 @@ nfs_page_free(struct nfs_page *p)
 struct nfs_page *
 nfs_create_request(struct nfs_open_context *ctx, struct inode *inode,
 		   struct page *page,
-		   unsigned int offset, unsigned int count)
+		   unsigned int offset, unsigned int count,
+		   struct pnfs_layout_segment *lseg)
 {
 	struct nfs_page		*req;
 
@@ -80,6 +82,8 @@ nfs_create_request(struct nfs_open_context *ctx, struct inode *inode,
 	req->wb_bytes   = count;
 	req->wb_context = get_nfs_open_context(ctx);
 	kref_init(&req->wb_kref);
+	req->wb_lseg    = lseg;
+	get_lseg(lseg);
 	return req;
 }
 
@@ -150,9 +154,12 @@ void nfs_clear_request(struct nfs_page *req)
 		put_nfs_open_context(ctx);
 		req->wb_context = NULL;
 	}
+	if (req->wb_lseg != NULL) {
+		put_lseg(req->wb_lseg);
+		req->wb_lseg = NULL;
+	}
 }
 
-
 /**
  * nfs_release_request - Release the count on an NFS read/write request
  * @req: request to release
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index a74a4b6..2b5f6fc 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1391,6 +1391,7 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
 	pgio->pg_iswrite = 0;
 	pgio->pg_boundary = 0;
 	pgio->pg_test = NULL;
+	pgio->pg_lseg = NULL;
 
 	if (!pnfs_enabled_sb(nfss))
 		return;
@@ -1400,7 +1401,8 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
 
 	if (count > 0) {
 		status = _pnfs_update_layout(inode, ctx, count,
-						loff, IOMODE_READ, NULL);
+					    loff, IOMODE_READ,
+					    &pgio->pg_lseg);
 		dprintk("%s virt update returned %d\n", __func__, status);
 		if (status != 0)
 			return;
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 28c49f1..68b4ca8 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -121,11 +121,14 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
 	LIST_HEAD(one_request);
 	struct nfs_page	*new;
 	unsigned int len;
+	struct pnfs_layout_segment *lseg;
 
 	len = nfs_page_length(page);
 	if (len == 0)
 		return nfs_return_empty_page(page);
-	new = nfs_create_request(ctx, inode, page, 0, len);
+	pnfs_update_layout(inode, ctx, NFS4_MAX_UINT64, 0, IOMODE_READ, &lseg);
+	new = nfs_create_request(ctx, inode, page, 0, len, lseg);
+	put_lseg(lseg);
 	if (IS_ERR(new)) {
 		unlock_page(page);
 		return PTR_ERR(new);
@@ -606,7 +609,8 @@ readpage_async_filler(void *data, struct page *page)
 	if (len == 0)
 		return nfs_return_empty_page(page);
 
-	new = nfs_create_request(desc->ctx, inode, page, 0, len);
+	new = nfs_create_request(desc->ctx, inode, page, 0, len,
+				 desc->pgio->pg_lseg);
 	if (IS_ERR(new))
 		goto out_error;
 
@@ -673,6 +677,7 @@ int nfs_readpages(struct file *filp, struct address_space *mapping,
 	ret = read_cache_pages(mapping, pages, readpage_async_filler, &desc);
 
 	nfs_pageio_complete(&pgio);
+	put_lseg(pgio.pg_lseg);
 	npages = (pgio.pg_bytes_written + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
 	nfs_add_stats(inode, NFSIOS_READPAGES, npages);
 read_complete:
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 13319c8..d8c0453 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -653,7 +653,7 @@ static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
 	req = nfs_try_to_update_request(inode, page, offset, bytes);
 	if (req != NULL)
 		goto out;
-	req = nfs_create_request(ctx, inode, page, offset, bytes);
+	req = nfs_create_request(ctx, inode, page, offset, bytes, NULL);
 	if (IS_ERR(req))
 		goto out;
 	error = nfs_inode_add_request(inode, req);
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index d04ebb2..18a455c 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -48,6 +48,7 @@ struct nfs_page {
 	struct kref		wb_kref;	/* reference count */
 	unsigned long		wb_flags;
 	struct nfs_writeverf	wb_verf;	/* Commit cookie */
+	struct pnfs_layout_segment *wb_lseg;	/* Pnfs layout info */
 };
 
 struct nfs_pageio_descriptor {
@@ -61,6 +62,7 @@ struct nfs_pageio_descriptor {
 	int			(*pg_doio)(struct inode *, struct list_head *, unsigned int, size_t, int);
 	int 			pg_ioflags;
 	int			pg_error;
+	struct pnfs_layout_segment *pg_lseg;
 #ifdef CONFIG_NFS_V4_1
 	int			pg_iswrite;
 	int			pg_boundary;
@@ -74,7 +76,8 @@ extern	struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
 					    struct inode *inode,
 					    struct page *page,
 					    unsigned int offset,
-					    unsigned int count);
+					    unsigned int count,
+					    struct pnfs_layout_segment *lseg);
 extern	void nfs_clear_request(struct nfs_page *req);
 extern	void nfs_release_request(struct nfs_page *req);
 
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 13/24] pnfs_submit: read path changeover
  2010-06-08  4:19                       ` [PATCH 12/24] pnfs_submit: stash and refcount lseg in read path Fred Isaman
@ 2010-06-08  4:19                         ` Fred Isaman
  2010-06-08  4:19                           ` [PATCH 14/24] pnfs_submit: use fsdata to pass lseg Fred Isaman
  2010-06-09 19:19                           ` [PATCH 13/24] pnfs_submit: read path changeover Boaz Harrosh
  0 siblings, 2 replies; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

Change readpages path to only call LAYOUTGET once.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/pagelist.c |    2 ++
 fs/nfs/pnfs.c     |   37 +++++++------------------------------
 fs/nfs/pnfs.h     |   25 ++++++++++++++++---------
 3 files changed, 25 insertions(+), 39 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index ed647b9..c3e5a1f 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -253,6 +253,8 @@ static int nfs_can_coalesce_requests(struct nfs_page *prev,
 		return 0;
 	if (prev->wb_pgbase + prev->wb_bytes != PAGE_CACHE_SIZE)
 		return 0;
+	if (req->wb_lseg != prev->wb_lseg)
+		return 0;
 #ifdef CONFIG_NFS_V4_1
 	if (pgio->pg_test && !pgio->pg_test(pgio, prev, req))
 		return 0;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 2b5f6fc..692a18e 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1689,7 +1689,7 @@ pnfs_readpages(struct nfs_read_data *rdata)
 {
 	struct nfs_readargs *args = &rdata->args;
 	struct inode *inode = rdata->inode;
-	int numpages, status, pgcount, temp;
+	int numpages, pgcount, temp;
 	struct nfs_server *nfss = NFS_SERVER(inode);
 	struct nfs_inode *nfsi = NFS_I(inode);
 	struct pnfs_layout_segment *lseg;
@@ -1701,19 +1701,8 @@ pnfs_readpages(struct nfs_read_data *rdata)
 		args->count,
 		args->offset);
 
-	/* Retrieve and set layout if not allready cached */
-	status = _pnfs_update_layout(inode,
-				    args->context,
-				    args->count,
-				    args->offset,
-				    IOMODE_READ,
-				    &lseg);
-	if (status) {
-		dprintk("%s: Updating layout failed (%d), retry with NFS \n",
-			__func__, status);
-		trypnfs = PNFS_NOT_ATTEMPTED;
-		goto out;
-	}
+	lseg = rdata->req->wb_lseg;
+	get_lseg(lseg);
 
 	/* Determine number of pages. */
 	pgcount = args->pgbase + args->count;
@@ -1740,7 +1729,6 @@ pnfs_readpages(struct nfs_read_data *rdata)
 		rdata->pdata.lseg = NULL;
 		put_lseg(lseg);
 	}
- out:
 	dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
 	return trypnfs;
 }
@@ -1749,21 +1737,10 @@ enum pnfs_try_status
 _pnfs_try_to_read_data(struct nfs_read_data *data,
 		       const struct rpc_call_ops *call_ops)
 {
-	struct inode *ino = data->inode;
-	struct nfs_server *nfss = NFS_SERVER(ino);
-
-	dprintk("--> %s\n", __func__);
-	/* Only create an rpc request if utilizing NFSv4 I/O */
-	if (!pnfs_enabled_sb(nfss) ||
-	    !nfss->pnfs_curr_ld->ld_io_ops->read_pagelist) {
-		dprintk("<-- %s: not using pnfs\n", __func__);
-		return PNFS_NOT_ATTEMPTED;
-	} else {
-		dprintk("%s: Utilizing pNFS I/O\n", __func__);
-		data->pdata.call_ops = call_ops;
-		data->pdata.pnfs_error = 0;
-		return pnfs_readpages(data);
-	}
+	dprintk("%s: Utilizing pNFS I/O\n", __func__);
+	data->pdata.call_ops = call_ops;
+	data->pdata.pnfs_error = 0;
+	return pnfs_readpages(data);
 }
 
 enum pnfs_try_status
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 6326ed5..816ebe1 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -94,22 +94,29 @@ static inline int pnfs_enabled_sb(struct nfs_server *nfss)
 	return nfss->pnfs_curr_ld != NULL;
 }
 
+static inline void _pnfs_clear_lseg_from_pages(struct list_head *head)
+{
+	struct nfs_page *req;
+
+	list_for_each_entry(req, head, wb_list) {
+		put_lseg(req->wb_lseg);
+		req->wb_lseg = NULL;
+	}
+}
+
 static inline enum pnfs_try_status
 pnfs_try_to_read_data(struct nfs_read_data *data,
 		      const struct rpc_call_ops *call_ops)
 {
-	struct inode *inode = data->inode;
-	struct nfs_server *nfss = NFS_SERVER(inode);
 	enum pnfs_try_status ret;
 
-	/* FIXME: read_pagelist should probably be mandated */
-	if (PNFS_EXISTS_LDIO_OP(nfss, read_pagelist))
-		ret = _pnfs_try_to_read_data(data, call_ops);
-	else
-		ret = PNFS_NOT_ATTEMPTED;
-
+	if (!data->req->wb_lseg)
+		return PNFS_NOT_ATTEMPTED;
+	ret = _pnfs_try_to_read_data(data, call_ops);
 	if (ret == PNFS_ATTEMPTED)
-		nfs_inc_stats(inode, NFSIOS_PNFS_READ);
+		nfs_inc_stats(data->inode, NFSIOS_PNFS_READ);
+	else
+		_pnfs_clear_lseg_from_pages(&data->pages);
 	return ret;
 }
 
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 14/24] pnfs_submit: use fsdata to pass lseg
  2010-06-08  4:19                         ` [PATCH 13/24] pnfs_submit: read path changeover Fred Isaman
@ 2010-06-08  4:19                           ` Fred Isaman
  2010-06-08  4:19                             ` [PATCH 15/24] pnfs_submit: stash and refcount lseg in write path Fred Isaman
                                               ` (2 more replies)
  2010-06-09 19:19                           ` [PATCH 13/24] pnfs_submit: read path changeover Boaz Harrosh
  1 sibling, 3 replies; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

Preparing for LAYUTGET invocation in nfs_write_begin to be the
only invocation in the write path.

It isn't used at all yet, but it should be properly referenced/dereferenced

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/file.c |   16 +++++++++++++---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 03601d2..fde6cb5 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -420,6 +420,8 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
 		file->f_path.dentry->d_name.name,
 		mapping->host->i_ino, len, (long long) pos);
 
+	pnfs_update_layout(mapping->host, NULL, NFS4_MAX_UINT64, 0, IOMODE_RW,
+			   (struct pnfs_layout_segment **) fsdata);
 start:
 	/*
 	 * Prevent starvation issues if someone is doing a consistency
@@ -428,11 +430,13 @@ start:
 	ret = wait_on_bit(&NFS_I(mapping->host)->flags, NFS_INO_FLUSHING,
 			nfs_wait_bit_killable, TASK_KILLABLE);
 	if (ret)
-		return ret;
+		goto out;
 
 	page = grab_cache_page_write_begin(mapping, index, flags);
-	if (!page)
-		return -ENOMEM;
+	if (!page) {
+		ret = -ENOMEM;
+		goto out;
+	}
 	*pagep = page;
 
 	ret = nfs_flush_incompatible(file, page);
@@ -447,6 +451,11 @@ start:
 		if (!ret)
 			goto start;
 	}
+ out:
+	if (ret) {
+		put_lseg(*fsdata);
+		*fsdata = NULL;
+	}
 	return ret;
 }
 
@@ -486,6 +495,7 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,
 
 	unlock_page(page);
 	page_cache_release(page);
+	put_lseg(fsdata);
 
 	if (status < 0)
 		return status;
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 15/24] pnfs_submit: stash and refcount lseg in write path
  2010-06-08  4:19                           ` [PATCH 14/24] pnfs_submit: use fsdata to pass lseg Fred Isaman
@ 2010-06-08  4:19                             ` Fred Isaman
  2010-06-08  4:19                               ` [PATCH 16/24] pnfs_submit: remove pnfs_file_operations Fred Isaman
  2010-06-09 10:38                             ` [PATCH 14/24] pnfs_submit: use fsdata to pass lseg Benny Halevy
  2010-06-09 19:33                             ` Boaz Harrosh
  2 siblings, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

Store the lseg in each nfs_page.  Note this necessitates adding checks
for compatibility with pre-existing nfs_pages lsegs.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/file.c          |   14 +++++++++-----
 fs/nfs/write.c         |   30 ++++++++++++++++++------------
 include/linux/nfs_fs.h |    8 ++++++--
 3 files changed, 33 insertions(+), 19 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index fde6cb5..7cdc2b7 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -420,7 +420,9 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
 		file->f_path.dentry->d_name.name,
 		mapping->host->i_ino, len, (long long) pos);
 
-	pnfs_update_layout(mapping->host, NULL, NFS4_MAX_UINT64, 0, IOMODE_RW,
+	pnfs_update_layout(mapping->host,
+			   nfs_file_open_context(file),
+			   NFS4_MAX_UINT64, 0, IOMODE_RW,
 			   (struct pnfs_layout_segment **) fsdata);
 start:
 	/*
@@ -439,7 +441,7 @@ start:
 	}
 	*pagep = page;
 
-	ret = nfs_flush_incompatible(file, page);
+	ret = nfs_flush_incompatible(file, page, *fsdata);
 	if (ret) {
 		unlock_page(page);
 		page_cache_release(page);
@@ -491,7 +493,7 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,
 			zero_user_segment(page, pglen, PAGE_CACHE_SIZE);
 	}
 
-	status = nfs_updatepage(file, page, offset, copied);
+	status = nfs_updatepage(file, page, offset, copied, fsdata);
 
 	unlock_page(page);
 	page_cache_release(page);
@@ -598,6 +600,8 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	/* make sure the cache has finished storing the page */
 	nfs_fscache_wait_on_page_write(NFS_I(dentry->d_inode), page);
 
+	/* XXX Do we want to call pnfs_update_layout here? */
+
 	lock_page(page);
 	mapping = page->mapping;
 	if (mapping != dentry->d_inode->i_mapping)
@@ -608,11 +612,11 @@ static int nfs_vm_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
 	if (pagelen == 0)
 		goto out_unlock;
 
-	ret = nfs_flush_incompatible(filp, page);
+	ret = nfs_flush_incompatible(filp, page, NULL);
 	if (ret != 0)
 		goto out_unlock;
 
-	ret = nfs_updatepage(filp, page, 0, pagelen);
+	ret = nfs_updatepage(filp, page, 0, pagelen, NULL);
 out_unlock:
 	if (!ret)
 		return VM_FAULT_LOCKED;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index d8c0453..e2fbddb 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -570,7 +570,8 @@ static inline int nfs_scan_commit(struct inode *inode, struct list_head *dst, pg
 static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
 		struct page *page,
 		unsigned int offset,
-		unsigned int bytes)
+		unsigned int bytes,
+		struct pnfs_layout_segment *lseg)
 {
 	struct nfs_page *req;
 	unsigned int rqend;
@@ -595,8 +596,8 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
 		 * Note: nfs_flush_incompatible() will already
 		 * have flushed out requests having wrong owners.
 		 */
-		if (offset > rqend
-		    || end < req->wb_offset)
+		if (offset > rqend || end < req->wb_offset ||
+		    req->wb_lseg != lseg)
 			goto out_flushme;
 
 		if (nfs_set_page_tag_locked(req))
@@ -644,16 +645,17 @@ out_err:
  * already called nfs_flush_incompatible() if necessary.
  */
 static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
-		struct page *page, unsigned int offset, unsigned int bytes)
+		struct page *page, unsigned int offset, unsigned int bytes,
+		struct pnfs_layout_segment *lseg)
 {
 	struct inode *inode = page->mapping->host;
 	struct nfs_page	*req;
 	int error;
 
-	req = nfs_try_to_update_request(inode, page, offset, bytes);
+	req = nfs_try_to_update_request(inode, page, offset, bytes, lseg);
 	if (req != NULL)
 		goto out;
-	req = nfs_create_request(ctx, inode, page, offset, bytes, NULL);
+	req = nfs_create_request(ctx, inode, page, offset, bytes, lseg);
 	if (IS_ERR(req))
 		goto out;
 	error = nfs_inode_add_request(inode, req);
@@ -666,11 +668,12 @@ out:
 }
 
 static int nfs_writepage_setup(struct nfs_open_context *ctx, struct page *page,
-		unsigned int offset, unsigned int count)
+			       unsigned int offset, unsigned int count,
+			       struct pnfs_layout_segment *lseg)
 {
 	struct nfs_page	*req;
 
-	req = nfs_setup_write_request(ctx, page, offset, count);
+	req = nfs_setup_write_request(ctx, page, offset, count, lseg);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 	nfs_mark_request_dirty(req);
@@ -682,7 +685,8 @@ static int nfs_writepage_setup(struct nfs_open_context *ctx, struct page *page,
 	return 0;
 }
 
-int nfs_flush_incompatible(struct file *file, struct page *page)
+int nfs_flush_incompatible(struct file *file, struct page *page,
+			   struct pnfs_layout_segment *lseg)
 {
 	struct nfs_open_context *ctx = nfs_file_open_context(file);
 	struct nfs_page	*req;
@@ -699,7 +703,8 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
 		req = nfs_page_find_request(page);
 		if (req == NULL)
 			return 0;
-		do_flush = req->wb_page != page || req->wb_context != ctx;
+		do_flush = req->wb_page != page || req->wb_context != ctx ||
+			req->wb_lseg != lseg;
 		nfs_release_request(req);
 		if (!do_flush)
 			return 0;
@@ -726,7 +731,8 @@ static int nfs_write_pageuptodate(struct page *page, struct inode *inode)
  * things with a page scheduled for an RPC call (e.g. invalidate it).
  */
 int nfs_updatepage(struct file *file, struct page *page,
-		unsigned int offset, unsigned int count)
+		   unsigned int offset, unsigned int count,
+		   struct pnfs_layout_segment *lseg)
 {
 	struct nfs_open_context *ctx = nfs_file_open_context(file);
 	struct inode	*inode = page->mapping->host;
@@ -751,7 +757,7 @@ int nfs_updatepage(struct file *file, struct page *page,
 		offset = 0;
 	}
 
-	status = nfs_writepage_setup(ctx, page, offset, count);
+	status = nfs_writepage_setup(ctx, page, offset, count, lseg);
 	if (status < 0)
 		nfs_set_pageerror(page);
 
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index ee45eac..0de7847 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -512,8 +512,12 @@ extern void nfs_unblock_sillyrename(struct dentry *dentry);
 extern int  nfs_congestion_kb;
 extern int  nfs_writepage(struct page *page, struct writeback_control *wbc);
 extern int  nfs_writepages(struct address_space *, struct writeback_control *);
-extern int  nfs_flush_incompatible(struct file *file, struct page *page);
-extern int  nfs_updatepage(struct file *, struct page *, unsigned int, unsigned int);
+struct pnfs_layout_segment;
+extern int  nfs_flush_incompatible(struct file *file, struct page *page,
+				   struct pnfs_layout_segment *lseg);
+extern int  nfs_updatepage(struct file *, struct page *,
+			   unsigned int offset, unsigned int count,
+			   struct pnfs_layout_segment *lseg);
 extern int nfs_writeback_done(struct rpc_task *, struct nfs_write_data *);
 
 /*
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 16/24] pnfs_submit: remove pnfs_file_operations
  2010-06-08  4:19                             ` [PATCH 15/24] pnfs_submit: stash and refcount lseg in write path Fred Isaman
@ 2010-06-08  4:19                               ` Fred Isaman
  2010-06-08  4:19                                 ` [PATCH 17/24] pnfs_submit: remove pnfs_update_layout_commit Fred Isaman
  2010-06-08  7:34                                 ` [PATCH 16/24] pnfs_submit: remove pnfs_file_operations Christoph Hellwig
  0 siblings, 2 replies; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

pnfs_writepages is useful, but not necessary, for determining size
parameters for LAYUTGET.

Also, the pnfs_file_operations were getting out of sync with
nfs_file_operations (see commits e1ebfd33be068 and bf40d3435caf49369).

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/file.c          |   24 ------------------------
 fs/nfs/nfs4proc.c      |    1 -
 fs/nfs/pnfs.c          |   35 -----------------------------------
 fs/nfs/pnfs.h          |    1 -
 include/linux/nfs_fs.h |    3 ---
 5 files changed, 0 insertions(+), 64 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 7cdc2b7..d796156 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -82,30 +82,6 @@ const struct file_operations nfs_file_operations = {
 	.setlease	= nfs_setlease,
 };
 
-#ifdef CONFIG_NFS_V4_1
-const struct file_operations pnfs_file_operations = {
-	.llseek		= nfs_file_llseek,
-	.read		= do_sync_read,
-	.write		= pnfs_file_write,
-	.aio_read	= nfs_file_read,
-	.aio_write	= nfs_file_write,
-#ifdef CONFIG_MMU
-	.mmap		= nfs_file_mmap,
-#else
-	.mmap		= generic_file_mmap,
-#endif
-	.open		= nfs_file_open,
-	.flush		= nfs_file_flush,
-	.release	= nfs_file_release,
-	.fsync		= nfs_file_fsync,
-	.lock		= nfs_lock,
-	.flock		= nfs_flock,
-	.splice_read	= nfs_file_splice_read,
-	.check_flags	= nfs_check_flags,
-	.setlease	= nfs_setlease,
-};
-#endif /* CONFIG_NFS_V4_1 */
-
 const struct inode_operations nfs_file_inode_operations = {
 	.permission	= nfs_permission,
 	.getattr	= nfs_getattr,
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index ee3e3bc..aa4581a 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5981,7 +5981,6 @@ pnfs_v4_clientops_init(void)
 	struct nfs_rpc_ops *p = (struct nfs_rpc_ops *)&pnfs_v4_clientops;
 
 	memcpy(p, &nfs_v4_clientops, sizeof(*p));
-	p->file_ops		= &pnfs_file_operations;
 	p->setattr		= pnfs4_proc_setattr;
 	p->read_done		= pnfs4_read_done;
 	p->write_setup		= pnfs4_proc_write_setup;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 692a18e..456b057 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1538,41 +1538,6 @@ pnfs_writeback_done(struct nfs_write_data *data)
 }
 
 /*
- * Obtain a layout for the the write range, and call do_sync_write.
- *
- * Unlike the read path which can wait until page coalescing
- * (pnfs_pageio_init_read) to get a layout, the write path discards the
- * request range to form the address_mapping - so we get a layout in
- * the file operations write method.
- *
- * If pnfs_update_layout fails, pages will be coalesced for MDS I/O.
- */
-ssize_t
-pnfs_file_write(struct file *filp, const char __user *buf, size_t count,
-		loff_t *pos)
-{
-	struct inode *inode = filp->f_dentry->d_inode;
-	struct nfs_open_context *context = filp->private_data;
-	int status;
-
-	if (!pnfs_enabled_sb(NFS_SERVER(inode)))
-		goto out;
-
-	/* Retrieve and set layout if not allready cached */
-	status = _pnfs_update_layout(inode,
-				    context,
-				    count,
-				    *pos,
-				    IOMODE_RW,
-				    NULL);
-	if (status)
-		dprintk("%s: Unable to get a layout for %Zu@%llu iomode %d)\n",
-			__func__, count, *pos, IOMODE_RW);
-out:
-	return do_sync_write(filp, buf, count, pos);
-}
-
-/*
  * Call the appropriate parallel I/O subsystem write function.
  * If no I/O device driver exists, or one does match the returned
  * fstype, then return a positive status for regular NFS processing.
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 816ebe1..e463c76 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -59,7 +59,6 @@ void pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *,
 			   struct nfs_open_context *, struct list_head *);
 void pnfs_pageio_init_write(struct nfs_pageio_descriptor *, struct inode *);
 void pnfs_update_layout_commit(struct inode *, struct list_head *, pgoff_t, unsigned int);
-ssize_t pnfs_file_write(struct file *, const char __user *, size_t, loff_t *);
 void pnfs_get_layout_done(struct nfs4_pnfs_layoutget *, int rpc_status);
 int pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp);
 void pnfs_layout_release(struct pnfs_layout_type *, struct nfs4_pnfs_layout_segment *range);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 0de7847..41026cb 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -411,9 +411,6 @@ extern const struct inode_operations nfs3_file_inode_operations;
 #endif /* CONFIG_NFS_V3 */
 extern const struct file_operations nfs_file_operations;
 extern const struct address_space_operations nfs_file_aops;
-#ifdef CONFIG_NFS_V4_1
-extern const struct file_operations pnfs_file_operations;
-#endif /* CONFIG_NFS_V4_1 */
 
 static inline struct nfs_open_context *nfs_file_open_context(struct file *filp)
 {
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 17/24] pnfs_submit: remove pnfs_update_layout_commit
  2010-06-08  4:19                               ` [PATCH 16/24] pnfs_submit: remove pnfs_file_operations Fred Isaman
@ 2010-06-08  4:19                                 ` Fred Isaman
  2010-06-08  4:19                                   ` [PATCH 18/24] pnfs_submit: remove pnfs_writepages LAYOUTGET invocation Fred Isaman
  2010-06-08  7:34                                 ` [PATCH 16/24] pnfs_submit: remove pnfs_file_operations Christoph Hellwig
  1 sibling, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

This seems completely extraneous.  Also note this was being
called from within a spinlock.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/pnfs.c  |   39 ---------------------------------------
 fs/nfs/pnfs.h  |    1 -
 fs/nfs/write.c |    8 +-------
 3 files changed, 1 insertions(+), 47 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 456b057..3d5c17b 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1428,45 +1428,6 @@ pnfs_pageio_init_write(struct nfs_pageio_descriptor *pgio, struct inode *inode)
 	pnfs_set_pg_test(inode, pgio);
 }
 
-/*
- * Get a layoutout for COMMIT
- */
-void
-pnfs_update_layout_commit(struct inode *inode,
-			struct list_head *head,
-			pgoff_t idx_start,
-			unsigned int npages)
-{
-	struct nfs_server *nfss = NFS_SERVER(inode);
-	struct nfs_page *nfs_page = nfs_list_entry(head->next);
-	u64 count;
-	loff_t start;
-	int status;
-
-	dprintk("--> %s inode %p layout range: %Zd@%llu\n", __func__, inode,
-		(size_t)(npages * PAGE_CACHE_SIZE),
-		(u64)((u64)idx_start << PAGE_CACHE_SHIFT));
-
-	if (!pnfs_enabled_sb(nfss))
-		return;
-
-	/* COMMIT indicates the whole file with offset = count = 0
-	 * whereas layout segments indicate whole file with offset = 0,
-	 * count = NFS4_MAX_UINT64.
-	 */
-	count = (size_t)npages * PAGE_CACHE_SIZE;
-	start = (loff_t)idx_start <<  PAGE_CACHE_SHIFT;
-	if (start == 0 && count == 0)
-		count = NFS4_MAX_UINT64;
-
-	status = _pnfs_update_layout(inode, nfs_page->wb_context,
-				count,
-				start,
-				IOMODE_RW,
-				NULL);
-	dprintk("%s  virt update status %d\n", __func__, status);
-}
-
 static int
 pnfs_call_done(struct pnfs_call_data *pdata, struct rpc_task *task, void *data)
 {
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index e463c76..339485d 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -58,7 +58,6 @@ enum pnfs_try_status _pnfs_try_to_commit(struct nfs_write_data *,
 void pnfs_pageio_init_read(struct nfs_pageio_descriptor *, struct inode *,
 			   struct nfs_open_context *, struct list_head *);
 void pnfs_pageio_init_write(struct nfs_pageio_descriptor *, struct inode *);
-void pnfs_update_layout_commit(struct inode *, struct list_head *, pgoff_t, unsigned int);
 void pnfs_get_layout_done(struct nfs4_pnfs_layoutget *, int rpc_status);
 int pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp);
 void pnfs_layout_release(struct pnfs_layout_type *, struct nfs4_pnfs_layout_segment *range);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index e2fbddb..960006e 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -538,14 +538,8 @@ nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, u
 	ret = nfs_scan_list(nfsi, dst, idx_start, npages, NFS_PAGE_TAG_COMMIT);
 	if (ret > 0)
 		nfsi->ncommit -= ret;
-	if (nfs_need_commit(NFS_I(inode))) {
+	if (nfs_need_commit(NFS_I(inode)))
 		__mark_inode_dirty(inode, I_DIRTY_DATASYNC);
-#ifdef CONFIG_NFS_V4_1
-		/* FIXME: change pnfs_update_layout_commit to derive
-		   idx_start from head of list and pass ret rather than npages */
-		pnfs_update_layout_commit(inode, dst, idx_start, npages);
-#endif /* CONFIG_NFS_V4_1 */
-	}
 	return ret;
 }
 #else
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 18/24] pnfs_submit: remove pnfs_writepages LAYOUTGET invocation
  2010-06-08  4:19                                 ` [PATCH 17/24] pnfs_submit: remove pnfs_update_layout_commit Fred Isaman
@ 2010-06-08  4:19                                   ` Fred Isaman
  2010-06-08  4:19                                     ` [PATCH 19/24] pnfs: export some commit error handling for use by layout drivers Fred Isaman
  0 siblings, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/pnfs.c |   37 +++++++------------------------------
 fs/nfs/pnfs.h |   15 ++++++---------
 2 files changed, 13 insertions(+), 39 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 3d5c17b..4907e3a 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1514,7 +1514,7 @@ pnfs_writepages(struct nfs_write_data *wdata, int how)
 {
 	struct nfs_writeargs *args = &wdata->args;
 	struct inode *inode = wdata->inode;
-	int numpages, status;
+	int numpages;
 	enum pnfs_try_status trypnfs;
 	struct nfs_server *nfss = NFS_SERVER(inode);
 	struct nfs_inode *nfsi = NFS_I(inode);
@@ -1526,19 +1526,8 @@ pnfs_writepages(struct nfs_write_data *wdata, int how)
 		args->count,
 		args->offset);
 
-	/* Retrieve and set layout if not allready cached */
-	status = _pnfs_update_layout(inode,
-				    args->context,
-				    args->count,
-				    args->offset,
-				    IOMODE_RW,
-				    &lseg);
-	if (status) {
-		dprintk("%s: Updating layout failed (%d), retry with NFS \n",
-			__func__, status);
-		trypnfs = PNFS_NOT_ATTEMPTED;	/* retry with nfs I/O */
-		goto out;
-	}
+	lseg = wdata->req->wb_lseg;
+	get_lseg(lseg);
 
 	/* Determine number of pages
 	 */
@@ -1566,7 +1555,6 @@ pnfs_writepages(struct nfs_write_data *wdata, int how)
 		wdata->pdata.lseg = NULL;
 		put_lseg(lseg);
 	}
-out:
 	dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
 	return trypnfs;
 }
@@ -1673,22 +1661,11 @@ enum pnfs_try_status
 _pnfs_try_to_write_data(struct nfs_write_data *data,
 			const struct rpc_call_ops *call_ops, int how)
 {
-	struct inode *ino = data->inode;
-	struct nfs_server *nfss = NFS_SERVER(ino);
-
 	dprintk("--> %s\n", __func__);
-	/* Only create an rpc request if utilizing NFSv4 I/O */
-	if (!pnfs_enabled_sb(nfss) ||
-	    !nfss->pnfs_curr_ld->ld_io_ops->write_pagelist) {
-		dprintk("<-- %s: not using pnfs\n", __func__);
-		return PNFS_NOT_ATTEMPTED;
-	} else {
-		dprintk("%s: Utilizing pNFS I/O\n", __func__);
-		data->pdata.call_ops = call_ops;
-		data->pdata.pnfs_error = 0;
-		data->pdata.how = how;
-		return pnfs_writepages(data, how);
-	}
+	data->pdata.call_ops = call_ops;
+	data->pdata.pnfs_error = 0;
+	data->pdata.how = how;
+	return pnfs_writepages(data, how);
 }
 
 enum pnfs_try_status
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 339485d..ea54210 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -123,18 +123,15 @@ pnfs_try_to_write_data(struct nfs_write_data *data,
 		       const struct rpc_call_ops *call_ops,
 		       int how)
 {
-	struct inode *inode = data->inode;
-	struct nfs_server *nfss = NFS_SERVER(inode);
 	enum pnfs_try_status ret;
 
-	/* FIXME: write_pagelist should probably be mandated */
-	if (PNFS_EXISTS_LDIO_OP(nfss, write_pagelist))
-		ret = _pnfs_try_to_write_data(data, call_ops, how);
-	else
-		ret = PNFS_NOT_ATTEMPTED;
-
+	if (!data->req->wb_lseg)
+		return PNFS_NOT_ATTEMPTED;
+	ret = _pnfs_try_to_write_data(data, call_ops, how);
 	if (ret == PNFS_ATTEMPTED)
-		nfs_inc_stats(inode, NFSIOS_PNFS_WRITE);
+		nfs_inc_stats(data->inode, NFSIOS_PNFS_WRITE);
+	else
+		_pnfs_clear_lseg_from_pages(&data->pages);
 	return ret;
 }
 
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 19/24] pnfs: export some commit error handling for use by layout drivers
  2010-06-08  4:19                                   ` [PATCH 18/24] pnfs_submit: remove pnfs_writepages LAYOUTGET invocation Fred Isaman
@ 2010-06-08  4:19                                     ` Fred Isaman
  2010-06-08  4:19                                       ` [PATCH 20/24] pnfs_submit: API change: remove pnfs_commit layoutget invocation Fred Isaman
  0 siblings, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

There exists code to deal with a memory error during commit before the
RPC has been sent.  Separate this out and export it for later use by the
filelayout driver.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/internal.h |    1 +
 fs/nfs/write.c    |   29 ++++++++++++++++++-----------
 2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 517aa0b..b754446 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -288,6 +288,7 @@ extern int pnfs_initiate_commit(struct nfs_write_data *data,
 			       const struct rpc_call_ops *call_ops,
 			       int how);
 extern void nfs_write_prepare(struct rpc_task *task, void *calldata);
+extern void nfs_mark_list_commit(struct list_head *head);
 #ifdef CONFIG_MIGRATION
 extern int nfs_migrate_page(struct address_space *,
 		struct page *, struct page *);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 960006e..811c776 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1377,6 +1377,23 @@ static int nfs_commit_rpcsetup(struct list_head *head,
 				    how);
 }
 
+/* Handle memory error during commit */
+void nfs_mark_list_commit(struct list_head *head)
+{
+	struct nfs_page         *req;
+
+	while (!list_empty(head)) {
+		req = nfs_list_entry(head->next);
+		nfs_list_remove_request(req);
+		nfs_mark_request_commit(req);
+		dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
+		dec_bdi_stat(req->wb_page->mapping->backing_dev_info,
+				BDI_RECLAIMABLE);
+		nfs_clear_page_tag_locked(req);
+	}
+}
+EXPORT_SYMBOL(nfs_mark_list_commit);
+
 /*
  * Commit dirty pages
  */
@@ -1384,25 +1401,15 @@ static int
 nfs_commit_list(struct inode *inode, struct list_head *head, int how)
 {
 	struct nfs_write_data	*data;
-	struct nfs_page         *req;
 
 	data = nfs_commitdata_alloc();
-
 	if (!data)
 		goto out_bad;
 
 	/* Set up the argument struct */
 	return nfs_commit_rpcsetup(head, data, how);
  out_bad:
-	while (!list_empty(head)) {
-		req = nfs_list_entry(head->next);
-		nfs_list_remove_request(req);
-		nfs_mark_request_commit(req);
-		dec_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
-		dec_bdi_stat(req->wb_page->mapping->backing_dev_info,
-				BDI_RECLAIMABLE);
-		nfs_clear_page_tag_locked(req);
-	}
+	nfs_mark_list_commit(head);
 	nfs_commit_clear_lock(NFS_I(inode));
 	return -ENOMEM;
 }
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 20/24] pnfs_submit: API change: remove pnfs_commit layoutget invocation
  2010-06-08  4:19                                     ` [PATCH 19/24] pnfs: export some commit error handling for use by layout drivers Fred Isaman
@ 2010-06-08  4:19                                       ` Fred Isaman
  2010-06-08  4:19                                         ` [PATCH 21/24] pnfs_submit: filelayout: rewrite filelayout_commit to use new API Fred Isaman
  2010-06-09  9:09                                         ` [PATCH 20/24] pnfs_submit: API change: remove pnfs_commit layoutget invocation Benny Halevy
  0 siblings, 2 replies; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

WARNING - this is an API change.

The layout driver's commit operation no longer takes an lseg.
This is because each nfs_page may or may not have an associated lseg.
It is the layout drivers task to send commits to the appropriate place.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/internal.h        |    2 +-
 fs/nfs/pagelist.c        |    5 ++-
 fs/nfs/pnfs.c            |   79 ++++++++-------------------------------------
 fs/nfs/pnfs.h            |   21 +++++-------
 fs/nfs/write.c           |   23 +++++++------
 include/linux/nfs_page.h |    3 +-
 6 files changed, 43 insertions(+), 90 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index b754446..a30974a 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -286,7 +286,7 @@ extern int nfs_initiate_commit(struct nfs_write_data *data,
 extern int pnfs_initiate_commit(struct nfs_write_data *data,
 			       struct rpc_clnt *clnt,
 			       const struct rpc_call_ops *call_ops,
-			       int how);
+				int how, int pnfs);
 extern void nfs_write_prepare(struct rpc_task *task, void *calldata);
 extern void nfs_mark_list_commit(struct list_head *head);
 #ifdef CONFIG_MIGRATION
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index c3e5a1f..c8de900 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -380,6 +380,7 @@ void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
  * @idx_start: lower bound of page->index to scan
  * @npages: idx_start + npages sets the upper bound to scan.
  * @tag: tag to scan for
+ * @use_pnfs: will be set TRUE if commit needs to be handled by layout driver
  *
  * Moves elements from one of the inode request lists.
  * If the number of requests is set to 0, the entire address_space
@@ -389,7 +390,7 @@ void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
  */
 int nfs_scan_list(struct nfs_inode *nfsi,
 		struct list_head *dst, pgoff_t idx_start,
-		unsigned int npages, int tag)
+		  unsigned int npages, int tag, int *use_pnfs)
 {
 	struct nfs_page *pgvec[NFS_SCAN_MAXENTRIES];
 	struct nfs_page *req;
@@ -420,6 +421,8 @@ int nfs_scan_list(struct nfs_inode *nfsi,
 				radix_tree_tag_clear(&nfsi->nfs_page_tree,
 						req->wb_index, tag);
 				nfs_list_add_request(req, dst);
+				if (req->wb_lseg)
+					*use_pnfs = 1;
 				res++;
 				if (res == INT_MAX)
 					goto out;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 4907e3a..9f28b28 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1672,19 +1672,11 @@ enum pnfs_try_status
 _pnfs_try_to_commit(struct nfs_write_data *data,
 		    const struct rpc_call_ops *call_ops, int how)
 {
-	struct inode *inode = data->inode;
-
-	if (!pnfs_enabled_sb(NFS_SERVER(inode))) {
-		dprintk("%s: Not using pNFS I/O\n", __func__);
-		return PNFS_NOT_ATTEMPTED;
-	} else {
-		/* data->call_ops and data->how set in nfs_commit_rpcsetup */
-		dprintk("%s: Utilizing pNFS I/O\n", __func__);
-		data->pdata.call_ops = call_ops;
-		data->pdata.pnfs_error = 0;
-		data->pdata.how = how;
-		return pnfs_commit(data, how);
-	}
+	dprintk("%s: Utilizing pNFS I/O\n", __func__);
+	data->pdata.call_ops = call_ops;
+	data->pdata.pnfs_error = 0;
+	data->pdata.how = how;
+	return pnfs_commit(data, how);
 }
 
 /* pNFS Commit callback function for all layout drivers */
@@ -1705,76 +1697,33 @@ pnfs_commit_done(struct nfs_write_data *data)
 		_pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE,
 				    true);
 		pnfs_initiate_commit(data, NFS_CLIENT(data->inode),
-				     pdata->call_ops, pdata->how);
+				     pdata->call_ops, pdata->how, 1);
 	}
 }
 
 static enum pnfs_try_status
 pnfs_commit(struct nfs_write_data *data, int sync)
 {
-	int result;
 	struct nfs_inode *nfsi = NFS_I(data->inode);
 	struct nfs_server *nfss = NFS_SERVER(data->inode);
-	struct pnfs_layout_segment *lseg;
-	struct nfs_page *first, *last, *p;
-	int npages;
 	enum pnfs_try_status trypnfs;
-	u64 count;
 
 	dprintk("%s: Begin\n", __func__);
 
-	/* If the layout driver doesn't define its own commit function
-	 * use standard NFSv4 commit
-	 */
-	first = last = nfs_list_entry(data->pages.next);
-	npages = 0;
-	list_for_each_entry(p, &data->pages, wb_list) {
-		last = p;
-		npages++;
-	}
-	/* COMMIT indicates the whole file with offset = count = 0
-	 * whereas layout segments indicate whole file with offset = 0,
-	 * count = NFS4_MAX_UINT64.
+	/* We need to account for possibility that
+	 * each nfs_page can point to a different lseg (or be NULL).
+	 * For the immediate case of whole-file-only layouts, we at
+	 * least know there can be only a single lseg.
+	 * We still have to account for the possibility of some being NULL.
+	 * This will be done by passing the buck to the layout driver.
 	 */
-	count = ((npages - 1) << PAGE_CACHE_SHIFT) + first->wb_bytes +
-		 (first != last) ? last->wb_bytes : 0;
-	if (first->wb_offset == 0 && count == 0)
-		count = NFS4_MAX_UINT64;
-
-	/* FIXME: we really ought to keep the layout segment that we used
-	   to write the page around for committing it and never ask for a
-	   new one.  If it was recalled we better commit the data first
-	   before returning it, otherwise the data needs to be rewritten,
-	   either with a new layout or to the MDS */
-	result = _pnfs_update_layout(data->inode,
-				    NULL,
-				    count,
-				    first->wb_offset,
-				    IOMODE_RW,
-				    &lseg);
-	/* If no layout have been retrieved,
-	 * use standard NFSv4 commit
-	 */
-	if (result) {
-		dprintk("%s: Updating layout failed (%d), retry with NFS \n",
-			__func__, result);
-		trypnfs = PNFS_NOT_ATTEMPTED;
-		goto out;
-	}
-
-	dprintk("%s: Calling layout driver commit\n", __func__);
+	data->pdata.lseg = NULL;
 	if (!pnfs_use_rpc(nfss))
 		data->pdata.pnfsflags |= PNFS_NO_RPC;
-	data->pdata.lseg = lseg;
 	trypnfs = nfss->pnfs_curr_ld->ld_io_ops->commit(&nfsi->layout,
 							sync, data);
-	if (trypnfs == PNFS_NOT_ATTEMPTED) {
+	if (trypnfs == PNFS_NOT_ATTEMPTED)
 		data->pdata.pnfsflags &= ~PNFS_NO_RPC;
-		data->pdata.lseg = NULL;
-		put_lseg(lseg);
-	}
-
-out:
 	dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
 	return trypnfs;
 }
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index ea54210..e231ca3 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -140,21 +140,18 @@ pnfs_try_to_commit(struct nfs_write_data *data,
 		   const struct rpc_call_ops *call_ops,
 		   int how)
 {
-	struct inode *inode = data->inode;
-	struct nfs_server *nfss = NFS_SERVER(inode);
 	enum pnfs_try_status ret;
 
-	/* Note that we check for "write_pagelist" and not for "commit"
-	   since if async writes were done and pages weren't marked as stable
-	   the commit method MUST be defined by the LD */
-	/* FIXME: write_pagelist should probably be mandated */
-	if (PNFS_EXISTS_LDIO_OP(nfss, write_pagelist))
-		ret = _pnfs_try_to_commit(data, call_ops, how);
-	else
-		ret = PNFS_NOT_ATTEMPTED;
-
+	/* Unlike in pnfs_try_to_write_data and pnfs_try_to_read_data,
+	 * we have no guarantee that all nfs_pages point to the same
+	 * lseg.  However, if we reach here, we are guaranteed that at
+	 * least one points to some lseg.
+	 */
+	ret = _pnfs_try_to_commit(data, call_ops, how);
 	if (ret == PNFS_ATTEMPTED)
-		nfs_inc_stats(inode, NFSIOS_PNFS_COMMIT);
+		nfs_inc_stats(data->inode, NFSIOS_PNFS_COMMIT);
+	else
+		_pnfs_clear_lseg_from_pages(&data->pages);
 	return ret;
 }
 
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 811c776..ebc9452 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -527,7 +527,7 @@ nfs_need_commit(struct nfs_inode *nfsi)
  * The requests are *not* checked to ensure that they form a contiguous set.
  */
 static int
-nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, unsigned int npages)
+nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, unsigned int npages, int *use_pnfs)
 {
 	struct nfs_inode *nfsi = NFS_I(inode);
 	int ret;
@@ -535,7 +535,8 @@ nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, u
 	if (!nfs_need_commit(nfsi))
 		return 0;
 
-	ret = nfs_scan_list(nfsi, dst, idx_start, npages, NFS_PAGE_TAG_COMMIT);
+	ret = nfs_scan_list(nfsi, dst, idx_start, npages, NFS_PAGE_TAG_COMMIT,
+			    use_pnfs);
 	if (ret > 0)
 		nfsi->ncommit -= ret;
 	if (nfs_need_commit(NFS_I(inode)))
@@ -1334,9 +1335,10 @@ EXPORT_SYMBOL(nfs_initiate_commit);
 int pnfs_initiate_commit(struct nfs_write_data *data,
 			 struct rpc_clnt *clnt,
 			 const struct rpc_call_ops *call_ops,
-			 int how)
+			 int how, int pnfs)
 {
-	if (pnfs_try_to_commit(data, &nfs_commit_ops, how) == PNFS_ATTEMPTED)
+	if (pnfs &&
+	    (pnfs_try_to_commit(data, &nfs_commit_ops, how) == PNFS_ATTEMPTED))
 		return pnfs_get_write_status(data);
 
 	return nfs_initiate_commit(data, clnt, &nfs_commit_ops, how);
@@ -1347,7 +1349,7 @@ int pnfs_initiate_commit(struct nfs_write_data *data,
  */
 static int nfs_commit_rpcsetup(struct list_head *head,
 		struct nfs_write_data *data,
-		int how)
+		int how, int pnfs)
 {
 	struct nfs_page *first = nfs_list_entry(head->next);
 	struct inode *inode = first->wb_context->path.dentry->d_inode;
@@ -1374,7 +1376,7 @@ static int nfs_commit_rpcsetup(struct list_head *head,
 	data->args.context = first->wb_context;  /* used by commit done */
 
 	return pnfs_initiate_commit(data, NFS_CLIENT(inode), &nfs_commit_ops,
-				    how);
+				    how, pnfs);
 }
 
 /* Handle memory error during commit */
@@ -1398,7 +1400,7 @@ EXPORT_SYMBOL(nfs_mark_list_commit);
  * Commit dirty pages
  */
 static int
-nfs_commit_list(struct inode *inode, struct list_head *head, int how)
+nfs_commit_list(struct inode *inode, struct list_head *head, int how, int pnfs)
 {
 	struct nfs_write_data	*data;
 
@@ -1407,7 +1409,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how)
 		goto out_bad;
 
 	/* Set up the argument struct */
-	return nfs_commit_rpcsetup(head, data, how);
+	return nfs_commit_rpcsetup(head, data, how, pnfs);
  out_bad:
 	nfs_mark_list_commit(head);
 	nfs_commit_clear_lock(NFS_I(inode));
@@ -1495,14 +1497,15 @@ static int nfs_commit_inode(struct inode *inode, int how)
 	LIST_HEAD(head);
 	int may_wait = how & FLUSH_SYNC;
 	int res = 0;
+	int use_pnfs = 0;
 
 	if (!nfs_commit_set_lock(NFS_I(inode), may_wait))
 		goto out_mark_dirty;
 	spin_lock(&inode->i_lock);
-	res = nfs_scan_commit(inode, &head, 0, 0);
+	res = nfs_scan_commit(inode, &head, 0, 0, &use_pnfs);
 	spin_unlock(&inode->i_lock);
 	if (res) {
-		int error = nfs_commit_list(inode, &head, how);
+		int error = nfs_commit_list(inode, &head, how, use_pnfs);
 		if (error < 0)
 			return error;
 		if (may_wait) {
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 18a455c..06e5157 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -83,7 +83,8 @@ extern	void nfs_release_request(struct nfs_page *req);
 
 
 extern	int nfs_scan_list(struct nfs_inode *nfsi, struct list_head *dst,
-			  pgoff_t idx_start, unsigned int npages, int tag);
+			  pgoff_t idx_start, unsigned int npages, int tag,
+			  int *use_pnfs);
 extern	void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
 			     struct inode *inode,
 			     int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int),
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 21/24] pnfs_submit: filelayout: rewrite filelayout_commit to use new API
  2010-06-08  4:19                                       ` [PATCH 20/24] pnfs_submit: API change: remove pnfs_commit layoutget invocation Fred Isaman
@ 2010-06-08  4:19                                         ` Fred Isaman
  2010-06-08  4:19                                           ` [PATCH 22/24] pnfs_submit: remove unecessary pnfs_fl_call_data field pnfs_client Fred Isaman
  2010-06-09  9:09                                         ` [PATCH 20/24] pnfs_submit: API change: remove pnfs_commit layoutget invocation Benny Halevy
  1 sibling, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

In the process, give it a much needed rewrite.

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/nfs4filelayout.c |  192 ++++++++++++++++++++++++++---------------------
 fs/nfs/write.c          |    9 ++
 2 files changed, 115 insertions(+), 86 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index e36c95d..756cb64 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -530,8 +530,7 @@ filelayout_clone_write_data(struct nfs_write_data *old)
 	nfs_fattr_init(&new->fattr);
 	new->res.verf    = &new->verf;
 	new->args.context = get_nfs_open_context(old->args.context);
-	new->pdata.lseg = old->pdata.lseg;
-	kref_get(&new->pdata.lseg->kref);
+	new->pdata.lseg = NULL;
 	new->pdata.call_ops = old->pdata.call_ops;
 	new->pdata.how = old->pdata.how;
 out:
@@ -559,103 +558,124 @@ enum pnfs_try_status
 filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
 		  struct nfs_write_data *data)
 {
-	struct nfs4_filelayout_segment *nfslay;
-	struct nfs_write_data   *dsdata = NULL;
+	LIST_HEAD(head);
+	struct nfs_page *req;
+	loff_t file_offset = 0;
+	u16 idx, i;
+	struct list_head **ds_page_list = NULL;
+	u16 *indices_used;
+	int num_indices_seen = 0;
+	const struct rpc_call_ops *call_ops;
+	struct rpc_clnt *clnt;
+	struct nfs_write_data **clone_list = NULL;
+	struct nfs_write_data *dsdata;
 	struct nfs4_pnfs_ds *ds;
-	struct nfs_page *req, *reqt;
-	struct list_head *pos, *tmp, head, head2;
-	loff_t file_offset, comp_offset;
-	enum pnfs_try_status trypnfs = PNFS_ATTEMPTED;
-	u32 idx1, idx2;
 
-	nfslay = LSEG_LD_DATA(data->pdata.lseg);
-
-	dprintk("%s data %p pnfs_client %p nfslay %p sync %d\n",
-		__func__, data, data->fldata.pnfs_client, nfslay, sync);
-
-	data->fldata.commit_through_mds = nfslay->commit_through_mds;
-	if (nfslay->commit_through_mds) {
-		dprintk("%s data %p commit through mds\n", __func__, data);
-		return PNFS_NOT_ATTEMPTED;
-	}
-
-	INIT_LIST_HEAD(&head);
-	INIT_LIST_HEAD(&head2);
-	list_add(&head, &data->pages);
-	list_del_init(&data->pages);
-
-	/* COMMIT to each Data Server */
-	while (!list_empty(&head)) {
-		req = nfs_list_entry(head.next);
-
-		file_offset = (loff_t)req->wb_index << PAGE_CACHE_SHIFT;
-
-		/* Get dserver for the current page */
-		idx1 = nfs4_fl_calc_ds_index(data->pdata.lseg, file_offset);
-		ds = nfs4_fl_prepare_ds(data->pdata.lseg, idx1);
-		if (!ds) {
-			data->pdata.pnfs_error = -EIO;
-			goto err_rewind;
+	dprintk("%s data %p pnfs_client %p sync %d\n",
+		__func__, data, data->fldata.pnfs_client, sync);
+
+	/* Alloc room for both in one go */
+	ds_page_list = kzalloc((NFS4_PNFS_MAX_MULTI_CNT + 1) *
+			       (sizeof(u16) + sizeof(struct list_head *)),
+			       GFP_KERNEL);
+	if (!ds_page_list)
+		goto mem_error;
+	indices_used = (u16 *) (ds_page_list + NFS4_PNFS_MAX_MULTI_CNT + 1);
+
+	/* Sort pages based on which ds to send to.
+	 * MDS is given index equal to NFS4_PNFS_MAX_MULTI_CNT.
+	 * Note we are assuming there is only a single lseg in play.
+	 * When that is not true, we could first sort on lseg, then
+	 * sort within each as we do here.
+	 */
+	while (!list_empty(&data->pages)) {
+		req = nfs_list_entry(data->pages.next);
+		nfs_list_remove_request(req);
+		if (!req->wb_lseg ||
+		    ((struct nfs4_filelayout_segment *)
+		     LSEG_LD_DATA(req->wb_lseg))->commit_through_mds)
+			idx = NFS4_PNFS_MAX_MULTI_CNT;
+		else {
+			file_offset = (loff_t)req->wb_index << PAGE_CACHE_SHIFT;
+			idx = nfs4_fl_calc_ds_index(req->wb_lseg, file_offset);
 		}
-
-		/* Gather all pages going to the current data server by
-		 * comparing their indices.
-		 * XXX: This recalculates the indices unecessarily.
-		 *      One idea would be to calc the index for every page
-		 *      and then compare if they are the same. */
-		list_for_each_safe(pos, tmp, &head) {
-			reqt = nfs_list_entry(pos);
-			comp_offset = (loff_t)reqt->wb_index << PAGE_CACHE_SHIFT;
-			idx2 = nfs4_fl_calc_ds_index(data->pdata.lseg,
-						     comp_offset);
-			if (idx1 == idx2) {
-				nfs_list_remove_request(reqt);
-				nfs_list_add_request(reqt, &head2);
-			}
+		if (ds_page_list[idx]) {
+			/* Already seen this idx */
+			list_add(&req->wb_list, ds_page_list[idx]);
+		} else {
+			/* New idx not seen so far */
+			list_add_tail(&req->wb_list, &head);
+			indices_used[num_indices_seen++] = idx;
 		}
-
-		if (!list_empty(&head)) {
-			dsdata = filelayout_clone_write_data(data);
-			if (!dsdata) {
-				/* return pages back to head */
-				list_splice(&head2, &head);
-				INIT_LIST_HEAD(&head2);
-				data->pdata.pnfs_error = -ENOMEM;
-				goto err_rewind;
-			}
+		ds_page_list[idx] = &req->wb_list;
+	}
+	/* Once created, clone must be released via call_op */
+	clone_list = kzalloc(num_indices_seen *
+			     sizeof(struct nfs_write_data *), GFP_KERNEL);
+	if (!clone_list)
+		goto mem_error;
+	for (i = 0; i < num_indices_seen - 1; i++) {
+		clone_list[i] = filelayout_clone_write_data(data);
+		if (!clone_list[i])
+			goto mem_error;
+	}
+	clone_list[i] = data;
+	/* Now send off the RPCs to each ds.  Note that it is important
+	 * that any RPC to the MDS be sent last (or at least after all
+	 * clones have been made.)
+	 */
+	for (i = 0; i < num_indices_seen; i++) {
+		dsdata = clone_list[i];
+		idx = indices_used[i];
+		list_cut_position(&dsdata->pages, &head, ds_page_list[idx]);
+		if (idx == NFS4_PNFS_MAX_MULTI_CNT) {
+			call_ops = data->pdata.call_ops;;
+			clnt = NFS_CLIENT(dsdata->inode);
+			ds = NULL;
 		} else {
-			dsdata = data;
+			call_ops = &filelayout_commit_call_ops;
+			req = nfs_list_entry(dsdata->pages.next);
+			ds = nfs4_fl_prepare_ds(req->wb_lseg, idx);
+			if (!ds) {
+				/* Trigger retry of this chunk through MDS */
+				dsdata->task.tk_status = -EIO;
+				data->pdata.call_ops->rpc_release(dsdata);
+				continue;
+			}
+			clnt = ds->ds_clp->cl_rpcclient;
+			dsdata->fldata.pnfs_client = clnt;
+			dsdata->fldata.ds_nfs_client = ds->ds_clp;
+			dsdata->args.fh = \
+				nfs4_fl_select_ds_fh(LSEG_LD_DATA(req->wb_lseg),
+						     idx);
 		}
-
-		list_add(&dsdata->pages, &head2);
-		list_del_init(&head2);
-
-		dsdata->fldata.pnfs_client = ds->ds_clp->cl_rpcclient;
-		dsdata->fldata.ds_nfs_client = ds->ds_clp;
-		dsdata->args.fh = nfs4_fl_select_ds_fh(nfslay, idx1);
-
 		dprintk("%s: Initiating commit: %llu USE DS:\n",
 			__func__, file_offset);
 		print_ds(ds);
 
 		/* Send COMMIT to data server */
-		nfs_initiate_commit(dsdata, dsdata->fldata.pnfs_client,
-				    &filelayout_commit_call_ops, sync);
+		nfs_initiate_commit(dsdata, clnt, call_ops, sync);
 	}
+	kfree(clone_list);
+	kfree(ds_page_list);
+	data->pdata.pnfs_error = 0;
+	return PNFS_ATTEMPTED;
 
-out:
-	if (data->pdata.pnfs_error)
-		printk(KERN_ERR "%s: ERROR %d\n", __func__,
-		       data->pdata.pnfs_error);
-
-	/* XXX should we send COMMIT to MDS e.g. not free data and return 1 ? */
-	return trypnfs;
-err_rewind:
-	/* put remaining pages back onto the original data->pages */
-	list_add(&data->pages, &head);
-	list_del_init(&head);
-	trypnfs = PNFS_NOT_ATTEMPTED;
-	goto out;
+ mem_error:
+	if (clone_list) {
+		for (i = 0; i < num_indices_seen - 1; i++) {
+			if (!clone_list[i])
+				break;
+			data->pdata.call_ops->rpc_release(clone_list[i]);
+		}
+		kfree(clone_list);
+	}
+	kfree(ds_page_list);
+	/* One of these will be empty, but doesn't hurt to do both */
+	nfs_mark_list_commit(&head);
+	nfs_mark_list_commit(&data->pages);
+	data->pdata.call_ops->rpc_release(data);
+	return PNFS_ATTEMPTED;
 }
 
 /* Return the stripesize for the specified file.
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ebc9452..8406fc1 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1461,6 +1461,15 @@ static void nfs_commit_release(void *calldata)
 			req->wb_bytes,
 			(long long)req_offset(req));
 		if (status < 0) {
+			if (req->wb_lseg) {
+				struct pnfs_layout_segment *lseg = req->wb_lseg;
+
+				req->wb_lseg = NULL;
+				put_lseg(lseg);
+				dprintk(" retry through MDS\n");
+				nfs_mark_request_dirty(req);
+				goto next;
+			}
 			nfs_context_set_write_error(req->wb_context, status);
 			nfs_inode_remove_request(req);
 			dprintk(", error = %d\n", status);
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 22/24] pnfs_submit: remove unecessary pnfs_fl_call_data field pnfs_client
  2010-06-08  4:19                                         ` [PATCH 21/24] pnfs_submit: filelayout: rewrite filelayout_commit to use new API Fred Isaman
@ 2010-06-08  4:19                                           ` Fred Isaman
  2010-06-08  4:19                                             ` [PATCH 23/24] pnfs_submit: remove unecessary pnfs_fl_call_data field commit_through_mds Fred Isaman
  0 siblings, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/nfs4filelayout.c |    9 +++------
 include/linux/nfs_xdr.h |    1 -
 2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/nfs4filelayout.c b/fs/nfs/nfs4filelayout.c
index 756cb64..b82e4ff 100644
--- a/fs/nfs/nfs4filelayout.c
+++ b/fs/nfs/nfs4filelayout.c
@@ -216,7 +216,6 @@ filelayout_read_pagelist(struct pnfs_layout_type *layoutid,
 
 	/* just try the first data server for the index..*/
 	data->fldata.ds_nfs_client = ds->ds_clp;
-	data->fldata.pnfs_client = ds->ds_clp->cl_rpcclient;
 	data->args.fh = nfs4_fl_select_ds_fh(flseg, idx);
 
 	/* Now get the file offset on the dserver
@@ -230,7 +229,7 @@ filelayout_read_pagelist(struct pnfs_layout_type *layoutid,
 	data->fldata.orig_offset = offset;
 
 	/* Perform an asynchronous read */
-	nfs_initiate_read(data, data->fldata.pnfs_client,
+	nfs_initiate_read(data, ds->ds_clp->cl_rpcclient,
 			  &filelayout_read_call_ops);
 
 	data->pdata.pnfs_error = 0;
@@ -269,7 +268,6 @@ filelayout_write_pagelist(struct pnfs_layout_type *layoutid,
 		htonl(ds->ds_ip_addr), ntohs(ds->ds_port), ds->r_addr);
 
 	data->fldata.ds_nfs_client = ds->ds_clp;
-	data->fldata.pnfs_client = ds->ds_clp->cl_rpcclient;
 	data->args.fh = nfs4_fl_select_ds_fh(flseg, idx);
 
 	/* Get the file offset on the dserver. Set the write offset to
@@ -281,7 +279,7 @@ filelayout_write_pagelist(struct pnfs_layout_type *layoutid,
 	/* Perform an asynchronous write The offset will be reset in the
 	 * call_ops->rpc_call_done() routine
 	 */
-	nfs_initiate_write(data, data->fldata.pnfs_client,
+	nfs_initiate_write(data, ds->ds_clp->cl_rpcclient,
 			   &filelayout_write_call_ops, sync);
 
 	data->pdata.pnfs_error = 0;
@@ -572,7 +570,7 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
 	struct nfs4_pnfs_ds *ds;
 
 	dprintk("%s data %p pnfs_client %p sync %d\n",
-		__func__, data, data->fldata.pnfs_client, sync);
+		__func__, data, data->fldata.ds_nfs_client->cl_rpcclient, sync);
 
 	/* Alloc room for both in one go */
 	ds_page_list = kzalloc((NFS4_PNFS_MAX_MULTI_CNT + 1) *
@@ -643,7 +641,6 @@ filelayout_commit(struct pnfs_layout_type *layoutid, int sync,
 				continue;
 			}
 			clnt = ds->ds_clp->cl_rpcclient;
-			dsdata->fldata.pnfs_client = clnt;
 			dsdata->fldata.ds_nfs_client = ds->ds_clp;
 			dsdata->args.fh = \
 				nfs4_fl_select_ds_fh(LSEG_LD_DATA(req->wb_lseg),
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 07d6dd2..183a9c3 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -980,7 +980,6 @@ struct pnfs_call_data {
 
 /* files layout-type specific data for read, write, and commit */
 struct pnfs_fl_call_data {
-	struct rpc_clnt		*pnfs_client;	/* Holds pNFS device across async calls */
 	struct nfs_client	*ds_nfs_client;
 	__u64			orig_offset;
 	int			commit_through_mds;
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 23/24] pnfs_submit: remove unecessary pnfs_fl_call_data field commit_through_mds
  2010-06-08  4:19                                           ` [PATCH 22/24] pnfs_submit: remove unecessary pnfs_fl_call_data field pnfs_client Fred Isaman
@ 2010-06-08  4:19                                             ` Fred Isaman
  2010-06-08  4:19                                               ` [PATCH 24/24] pnfs_submit: pnfs_update_layout can return void Fred Isaman
  0 siblings, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/nfs4proc.c       |    8 ++++----
 include/linux/nfs_xdr.h |    1 -
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index aa4581a..ca17872 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3362,17 +3362,17 @@ static void nfs4_proc_commit_setup(struct nfs_write_data *data, struct rpc_messa
 
 #if defined(CONFIG_NFS_V4_1)
 /*
- * pNFS doew not send a getattr to Data Serfers on commit.
+ * pNFS doew not send a getattr to Data Servers on commit.
  */
 static void
 pnfs4_proc_commit_setup(struct nfs_write_data *data, struct rpc_message *msg)
 {
 	struct nfs_server *server = NFS_SERVER(data->inode);
 
-	dprintk("--> %s ds_nfs_client %p commit_through_mds %d\n", __func__,
-		data->fldata.ds_nfs_client, data->fldata.commit_through_mds);
+	dprintk("--> %s ds_nfs_client %p\n", __func__,
+		data->fldata.ds_nfs_client);
 
-	if (!data->fldata.ds_nfs_client || data->fldata.commit_through_mds)
+	if (!data->fldata.ds_nfs_client)
 		return nfs4_proc_commit_setup(data, msg);
 
 	data->args.bitmask = server->attr_bitmask;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 183a9c3..2acdb8e 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -982,7 +982,6 @@ struct pnfs_call_data {
 struct pnfs_fl_call_data {
 	struct nfs_client	*ds_nfs_client;
 	__u64			orig_offset;
-	int			commit_through_mds;
 };
 #endif /* CONFIG_NFS_V4_1 */
 
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 24/24] pnfs_submit: pnfs_update_layout can return void
  2010-06-08  4:19                                             ` [PATCH 23/24] pnfs_submit: remove unecessary pnfs_fl_call_data field commit_through_mds Fred Isaman
@ 2010-06-08  4:19                                               ` Fred Isaman
  0 siblings, 0 replies; 46+ messages in thread
From: Fred Isaman @ 2010-06-08  4:19 UTC (permalink / raw
  To: linux-nfs

Signed-off-by: Fred Isaman <iisaman@netapp.com>
---
 fs/nfs/pnfs.c |   27 +++++++++------------------
 fs/nfs/pnfs.h |   11 ++++-------
 2 files changed, 13 insertions(+), 25 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 9f28b28..4f5d2ea 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1072,7 +1072,7 @@ pnfs_has_layout(struct pnfs_layout_type *lo,
  * If lsegpp is given, the appropriate layout segment is referenced and
  * returned to the caller.
  */
-int
+void
 _pnfs_update_layout(struct inode *ino,
 		   struct nfs_open_context *ctx,
 		   u64 count,
@@ -1090,14 +1090,12 @@ _pnfs_update_layout(struct inode *ino,
 	struct pnfs_layout_segment *lseg = NULL;
 	bool take_ref = (lsegpp != NULL);
 	DEFINE_WAIT(__wait);
-	int result = 0;
 
 	if (take_ref)
 		*lsegpp = NULL;
 	lo = get_lock_alloc_layout(ino);
 	if (IS_ERR(lo)) {
 		dprintk("%s ERROR: can't get pnfs_layout_type\n", __func__);
-		result = PTR_ERR(lo);
 		goto out;
 	}
 
@@ -1108,7 +1106,6 @@ _pnfs_update_layout(struct inode *ino,
 			put_lseg_locked(lseg);
 		/* someone is cleaning the layout */
 		lseg = NULL;
-		result = -EAGAIN;
 		goto out_put;
 	}
 
@@ -1131,21 +1128,18 @@ _pnfs_update_layout(struct inode *ino,
 			clear_bit(lo_fail_bit(iomode),
 				  &nfsi->layout.pnfs_layout_state);
 			nfsi->layout.pnfs_layout_suspend = 0;
-		} else {
-			result = 1;
+		} else
 			goto out_put;
-		}
 	}
 
 	/* Lose lock, but not reference, match this with pnfs_layout_release */
 	unlock_current_layout(nfsi);
 
-	result = get_layout(ino, ctx, &arg, lsegpp, lo);
+	get_layout(ino, ctx, &arg, lsegpp, lo);
 out:
-	dprintk("%s end (err:%d) state 0x%lx lseg %p\n",
-			__func__, result, nfsi->layout.pnfs_layout_state,
-		lseg);
-	return result;
+	dprintk("%s end, state 0x%lx lseg %p\n", __func__,
+		nfsi->layout.pnfs_layout_state, lseg);
+	return;
 out_put:
 	if (lsegpp)
 		*lsegpp = lseg;
@@ -1386,7 +1380,6 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
 	struct nfs_server *nfss = NFS_SERVER(inode);
 	size_t count = 0;
 	loff_t loff;
-	int status = 0;
 
 	pgio->pg_iswrite = 0;
 	pgio->pg_boundary = 0;
@@ -1400,11 +1393,9 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
 	readahead_range(inode, pages, &loff, &count);
 
 	if (count > 0) {
-		status = _pnfs_update_layout(inode, ctx, count,
-					    loff, IOMODE_READ,
-					    &pgio->pg_lseg);
-		dprintk("%s virt update returned %d\n", __func__, status);
-		if (status != 0)
+		_pnfs_update_layout(inode, ctx, count, loff, IOMODE_READ,
+				    &pgio->pg_lseg);
+		if (!pgio->pg_lseg)
 			return;
 
 		pgio->pg_boundary = pnfs_getboundary(inode);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index e231ca3..f7e21dc 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -32,7 +32,7 @@ extern int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp, bool wait
 extern const nfs4_stateid zero_stateid;
 
 void put_lseg(struct pnfs_layout_segment *lseg);
-int _pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
+void _pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
 	u64 count, loff_t pos, enum pnfs_iomode access_type,
 	struct pnfs_layout_segment **lsegpp);
 
@@ -176,7 +176,7 @@ static inline int pnfs_return_layout(struct inode *ino,
 	return 0;
 }
 
-static inline int pnfs_update_layout(struct inode *ino,
+static inline void pnfs_update_layout(struct inode *ino,
 	struct nfs_open_context *ctx,
 	u64 count, loff_t pos, enum pnfs_iomode access_type,
 	struct pnfs_layout_segment **lsegpp)
@@ -184,12 +184,10 @@ static inline int pnfs_update_layout(struct inode *ino,
 	struct nfs_server *nfss = NFS_SERVER(ino);
 
 	if (pnfs_enabled_sb(nfss))
-		return _pnfs_update_layout(ino, ctx, count, pos,
-					   access_type, lsegpp);
+		_pnfs_update_layout(ino, ctx, count, pos, access_type, lsegpp);
 	else {
 		if (lsegpp)
 			*lsegpp = NULL;
-		return 0;
 	}
 }
 
@@ -221,14 +219,13 @@ static inline void put_lseg(struct pnfs_layout_segment *lseg)
 {
 }
 
-static inline int
+static inline void
 pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
 	u64 count, loff_t pos, enum pnfs_iomode access_type,
 	struct pnfs_layout_segment **lsegpp)
 {
 	if (lsegpp)
 		*lsegpp = NULL;
-	return 0;
 }
 
 static inline enum pnfs_try_status
-- 
1.6.6.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 16/24] pnfs_submit: remove pnfs_file_operations
  2010-06-08  4:19                               ` [PATCH 16/24] pnfs_submit: remove pnfs_file_operations Fred Isaman
  2010-06-08  4:19                                 ` [PATCH 17/24] pnfs_submit: remove pnfs_update_layout_commit Fred Isaman
@ 2010-06-08  7:34                                 ` Christoph Hellwig
  1 sibling, 0 replies; 46+ messages in thread
From: Christoph Hellwig @ 2010-06-08  7:34 UTC (permalink / raw
  To: Fred Isaman; +Cc: linux-nfs

> -ssize_t
> -pnfs_file_write(struct file *filp, const char __user *buf, size_t count,
> -		loff_t *pos)
> -{

Just doing this in ->write also means you were missing out on AIO
and vectored writes, so it can't have been a that big help.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 20/24] pnfs_submit: API change: remove pnfs_commit layoutget invocation
  2010-06-08  4:19                                       ` [PATCH 20/24] pnfs_submit: API change: remove pnfs_commit layoutget invocation Fred Isaman
  2010-06-08  4:19                                         ` [PATCH 21/24] pnfs_submit: filelayout: rewrite filelayout_commit to use new API Fred Isaman
@ 2010-06-09  9:09                                         ` Benny Halevy
  2010-06-09 12:21                                           ` Fred Isaman
  1 sibling, 1 reply; 46+ messages in thread
From: Benny Halevy @ 2010-06-09  9:09 UTC (permalink / raw
  To: Fred Isaman; +Cc: linux-nfs

On Jun. 08, 2010, 7:19 +0300, Fred Isaman <iisaman@netapp.com> wrote:
> WARNING - this is an API change.
> 
> The layout driver's commit operation no longer takes an lseg.
> This is because each nfs_page may or may not have an associated lseg.
> It is the layout drivers task to send commits to the appropriate place.

So if the appropriate place for all pages that have no lseg associated
with them is the MDS why shouldn't the generic layer do that?

Benny

> 
> Signed-off-by: Fred Isaman <iisaman@netapp.com>
> ---
>  fs/nfs/internal.h        |    2 +-
>  fs/nfs/pagelist.c        |    5 ++-
>  fs/nfs/pnfs.c            |   79 ++++++++-------------------------------------
>  fs/nfs/pnfs.h            |   21 +++++-------
>  fs/nfs/write.c           |   23 +++++++------
>  include/linux/nfs_page.h |    3 +-
>  6 files changed, 43 insertions(+), 90 deletions(-)
> 
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index b754446..a30974a 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -286,7 +286,7 @@ extern int nfs_initiate_commit(struct nfs_write_data *data,
>  extern int pnfs_initiate_commit(struct nfs_write_data *data,
>  			       struct rpc_clnt *clnt,
>  			       const struct rpc_call_ops *call_ops,
> -			       int how);
> +				int how, int pnfs);
>  extern void nfs_write_prepare(struct rpc_task *task, void *calldata);
>  extern void nfs_mark_list_commit(struct list_head *head);
>  #ifdef CONFIG_MIGRATION
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index c3e5a1f..c8de900 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -380,6 +380,7 @@ void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
>   * @idx_start: lower bound of page->index to scan
>   * @npages: idx_start + npages sets the upper bound to scan.
>   * @tag: tag to scan for
> + * @use_pnfs: will be set TRUE if commit needs to be handled by layout driver
>   *
>   * Moves elements from one of the inode request lists.
>   * If the number of requests is set to 0, the entire address_space
> @@ -389,7 +390,7 @@ void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
>   */
>  int nfs_scan_list(struct nfs_inode *nfsi,
>  		struct list_head *dst, pgoff_t idx_start,
> -		unsigned int npages, int tag)
> +		  unsigned int npages, int tag, int *use_pnfs)
>  {
>  	struct nfs_page *pgvec[NFS_SCAN_MAXENTRIES];
>  	struct nfs_page *req;
> @@ -420,6 +421,8 @@ int nfs_scan_list(struct nfs_inode *nfsi,
>  				radix_tree_tag_clear(&nfsi->nfs_page_tree,
>  						req->wb_index, tag);
>  				nfs_list_add_request(req, dst);
> +				if (req->wb_lseg)
> +					*use_pnfs = 1;
>  				res++;
>  				if (res == INT_MAX)
>  					goto out;
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 4907e3a..9f28b28 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -1672,19 +1672,11 @@ enum pnfs_try_status
>  _pnfs_try_to_commit(struct nfs_write_data *data,
>  		    const struct rpc_call_ops *call_ops, int how)
>  {
> -	struct inode *inode = data->inode;
> -
> -	if (!pnfs_enabled_sb(NFS_SERVER(inode))) {
> -		dprintk("%s: Not using pNFS I/O\n", __func__);
> -		return PNFS_NOT_ATTEMPTED;
> -	} else {
> -		/* data->call_ops and data->how set in nfs_commit_rpcsetup */
> -		dprintk("%s: Utilizing pNFS I/O\n", __func__);
> -		data->pdata.call_ops = call_ops;
> -		data->pdata.pnfs_error = 0;
> -		data->pdata.how = how;
> -		return pnfs_commit(data, how);
> -	}
> +	dprintk("%s: Utilizing pNFS I/O\n", __func__);
> +	data->pdata.call_ops = call_ops;
> +	data->pdata.pnfs_error = 0;
> +	data->pdata.how = how;
> +	return pnfs_commit(data, how);
>  }
>  
>  /* pNFS Commit callback function for all layout drivers */
> @@ -1705,76 +1697,33 @@ pnfs_commit_done(struct nfs_write_data *data)
>  		_pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE,
>  				    true);
>  		pnfs_initiate_commit(data, NFS_CLIENT(data->inode),
> -				     pdata->call_ops, pdata->how);
> +				     pdata->call_ops, pdata->how, 1);
>  	}
>  }
>  
>  static enum pnfs_try_status
>  pnfs_commit(struct nfs_write_data *data, int sync)
>  {
> -	int result;
>  	struct nfs_inode *nfsi = NFS_I(data->inode);
>  	struct nfs_server *nfss = NFS_SERVER(data->inode);
> -	struct pnfs_layout_segment *lseg;
> -	struct nfs_page *first, *last, *p;
> -	int npages;
>  	enum pnfs_try_status trypnfs;
> -	u64 count;
>  
>  	dprintk("%s: Begin\n", __func__);
>  
> -	/* If the layout driver doesn't define its own commit function
> -	 * use standard NFSv4 commit
> -	 */
> -	first = last = nfs_list_entry(data->pages.next);
> -	npages = 0;
> -	list_for_each_entry(p, &data->pages, wb_list) {
> -		last = p;
> -		npages++;
> -	}
> -	/* COMMIT indicates the whole file with offset = count = 0
> -	 * whereas layout segments indicate whole file with offset = 0,
> -	 * count = NFS4_MAX_UINT64.
> +	/* We need to account for possibility that
> +	 * each nfs_page can point to a different lseg (or be NULL).
> +	 * For the immediate case of whole-file-only layouts, we at
> +	 * least know there can be only a single lseg.
> +	 * We still have to account for the possibility of some being NULL.
> +	 * This will be done by passing the buck to the layout driver.
>  	 */
> -	count = ((npages - 1) << PAGE_CACHE_SHIFT) + first->wb_bytes +
> -		 (first != last) ? last->wb_bytes : 0;
> -	if (first->wb_offset == 0 && count == 0)
> -		count = NFS4_MAX_UINT64;
> -
> -	/* FIXME: we really ought to keep the layout segment that we used
> -	   to write the page around for committing it and never ask for a
> -	   new one.  If it was recalled we better commit the data first
> -	   before returning it, otherwise the data needs to be rewritten,
> -	   either with a new layout or to the MDS */
> -	result = _pnfs_update_layout(data->inode,
> -				    NULL,
> -				    count,
> -				    first->wb_offset,
> -				    IOMODE_RW,
> -				    &lseg);
> -	/* If no layout have been retrieved,
> -	 * use standard NFSv4 commit
> -	 */
> -	if (result) {
> -		dprintk("%s: Updating layout failed (%d), retry with NFS \n",
> -			__func__, result);
> -		trypnfs = PNFS_NOT_ATTEMPTED;
> -		goto out;
> -	}
> -
> -	dprintk("%s: Calling layout driver commit\n", __func__);
> +	data->pdata.lseg = NULL;
>  	if (!pnfs_use_rpc(nfss))
>  		data->pdata.pnfsflags |= PNFS_NO_RPC;
> -	data->pdata.lseg = lseg;
>  	trypnfs = nfss->pnfs_curr_ld->ld_io_ops->commit(&nfsi->layout,
>  							sync, data);
> -	if (trypnfs == PNFS_NOT_ATTEMPTED) {
> +	if (trypnfs == PNFS_NOT_ATTEMPTED)
>  		data->pdata.pnfsflags &= ~PNFS_NO_RPC;
> -		data->pdata.lseg = NULL;
> -		put_lseg(lseg);
> -	}
> -
> -out:
>  	dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
>  	return trypnfs;
>  }
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index ea54210..e231ca3 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -140,21 +140,18 @@ pnfs_try_to_commit(struct nfs_write_data *data,
>  		   const struct rpc_call_ops *call_ops,
>  		   int how)
>  {
> -	struct inode *inode = data->inode;
> -	struct nfs_server *nfss = NFS_SERVER(inode);
>  	enum pnfs_try_status ret;
>  
> -	/* Note that we check for "write_pagelist" and not for "commit"
> -	   since if async writes were done and pages weren't marked as stable
> -	   the commit method MUST be defined by the LD */
> -	/* FIXME: write_pagelist should probably be mandated */
> -	if (PNFS_EXISTS_LDIO_OP(nfss, write_pagelist))
> -		ret = _pnfs_try_to_commit(data, call_ops, how);
> -	else
> -		ret = PNFS_NOT_ATTEMPTED;
> -
> +	/* Unlike in pnfs_try_to_write_data and pnfs_try_to_read_data,
> +	 * we have no guarantee that all nfs_pages point to the same
> +	 * lseg.  However, if we reach here, we are guaranteed that at
> +	 * least one points to some lseg.
> +	 */
> +	ret = _pnfs_try_to_commit(data, call_ops, how);
>  	if (ret == PNFS_ATTEMPTED)
> -		nfs_inc_stats(inode, NFSIOS_PNFS_COMMIT);
> +		nfs_inc_stats(data->inode, NFSIOS_PNFS_COMMIT);
> +	else
> +		_pnfs_clear_lseg_from_pages(&data->pages);
>  	return ret;
>  }
>  
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 811c776..ebc9452 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -527,7 +527,7 @@ nfs_need_commit(struct nfs_inode *nfsi)
>   * The requests are *not* checked to ensure that they form a contiguous set.
>   */
>  static int
> -nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, unsigned int npages)
> +nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, unsigned int npages, int *use_pnfs)
>  {
>  	struct nfs_inode *nfsi = NFS_I(inode);
>  	int ret;
> @@ -535,7 +535,8 @@ nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, u
>  	if (!nfs_need_commit(nfsi))
>  		return 0;
>  
> -	ret = nfs_scan_list(nfsi, dst, idx_start, npages, NFS_PAGE_TAG_COMMIT);
> +	ret = nfs_scan_list(nfsi, dst, idx_start, npages, NFS_PAGE_TAG_COMMIT,
> +			    use_pnfs);
>  	if (ret > 0)
>  		nfsi->ncommit -= ret;
>  	if (nfs_need_commit(NFS_I(inode)))
> @@ -1334,9 +1335,10 @@ EXPORT_SYMBOL(nfs_initiate_commit);
>  int pnfs_initiate_commit(struct nfs_write_data *data,
>  			 struct rpc_clnt *clnt,
>  			 const struct rpc_call_ops *call_ops,
> -			 int how)
> +			 int how, int pnfs)
>  {
> -	if (pnfs_try_to_commit(data, &nfs_commit_ops, how) == PNFS_ATTEMPTED)
> +	if (pnfs &&
> +	    (pnfs_try_to_commit(data, &nfs_commit_ops, how) == PNFS_ATTEMPTED))
>  		return pnfs_get_write_status(data);
>  
>  	return nfs_initiate_commit(data, clnt, &nfs_commit_ops, how);
> @@ -1347,7 +1349,7 @@ int pnfs_initiate_commit(struct nfs_write_data *data,
>   */
>  static int nfs_commit_rpcsetup(struct list_head *head,
>  		struct nfs_write_data *data,
> -		int how)
> +		int how, int pnfs)
>  {
>  	struct nfs_page *first = nfs_list_entry(head->next);
>  	struct inode *inode = first->wb_context->path.dentry->d_inode;
> @@ -1374,7 +1376,7 @@ static int nfs_commit_rpcsetup(struct list_head *head,
>  	data->args.context = first->wb_context;  /* used by commit done */
>  
>  	return pnfs_initiate_commit(data, NFS_CLIENT(inode), &nfs_commit_ops,
> -				    how);
> +				    how, pnfs);
>  }
>  
>  /* Handle memory error during commit */
> @@ -1398,7 +1400,7 @@ EXPORT_SYMBOL(nfs_mark_list_commit);
>   * Commit dirty pages
>   */
>  static int
> -nfs_commit_list(struct inode *inode, struct list_head *head, int how)
> +nfs_commit_list(struct inode *inode, struct list_head *head, int how, int pnfs)
>  {
>  	struct nfs_write_data	*data;
>  
> @@ -1407,7 +1409,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how)
>  		goto out_bad;
>  
>  	/* Set up the argument struct */
> -	return nfs_commit_rpcsetup(head, data, how);
> +	return nfs_commit_rpcsetup(head, data, how, pnfs);
>   out_bad:
>  	nfs_mark_list_commit(head);
>  	nfs_commit_clear_lock(NFS_I(inode));
> @@ -1495,14 +1497,15 @@ static int nfs_commit_inode(struct inode *inode, int how)
>  	LIST_HEAD(head);
>  	int may_wait = how & FLUSH_SYNC;
>  	int res = 0;
> +	int use_pnfs = 0;
>  
>  	if (!nfs_commit_set_lock(NFS_I(inode), may_wait))
>  		goto out_mark_dirty;
>  	spin_lock(&inode->i_lock);
> -	res = nfs_scan_commit(inode, &head, 0, 0);
> +	res = nfs_scan_commit(inode, &head, 0, 0, &use_pnfs);
>  	spin_unlock(&inode->i_lock);
>  	if (res) {
> -		int error = nfs_commit_list(inode, &head, how);
> +		int error = nfs_commit_list(inode, &head, how, use_pnfs);
>  		if (error < 0)
>  			return error;
>  		if (may_wait) {
> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
> index 18a455c..06e5157 100644
> --- a/include/linux/nfs_page.h
> +++ b/include/linux/nfs_page.h
> @@ -83,7 +83,8 @@ extern	void nfs_release_request(struct nfs_page *req);
>  
>  
>  extern	int nfs_scan_list(struct nfs_inode *nfsi, struct list_head *dst,
> -			  pgoff_t idx_start, unsigned int npages, int tag);
> +			  pgoff_t idx_start, unsigned int npages, int tag,
> +			  int *use_pnfs);
>  extern	void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>  			     struct inode *inode,
>  			     int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int),


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/24] pnfs_submit: use fsdata to pass lseg
  2010-06-08  4:19                           ` [PATCH 14/24] pnfs_submit: use fsdata to pass lseg Fred Isaman
  2010-06-08  4:19                             ` [PATCH 15/24] pnfs_submit: stash and refcount lseg in write path Fred Isaman
@ 2010-06-09 10:38                             ` Benny Halevy
  2010-06-09 12:08                               ` Fred Isaman
  2010-06-09 19:33                             ` Boaz Harrosh
  2 siblings, 1 reply; 46+ messages in thread
From: Benny Halevy @ 2010-06-09 10:38 UTC (permalink / raw
  To: Fred Isaman; +Cc: linux-nfs

Fred, how does that patch interact with
285052f pnfs_post_submit: Restore "pnfs: pnfs_do_flush"
and the latter patches that depend on it?

Benny

On Jun. 08, 2010, 7:19 +0300, Fred Isaman <iisaman@netapp.com> wrote:
> Preparing for LAYUTGET invocation in nfs_write_begin to be the
> only invocation in the write path.
> 
> It isn't used at all yet, but it should be properly referenced/dereferenced
> 
> Signed-off-by: Fred Isaman <iisaman@netapp.com>
> ---
>  fs/nfs/file.c |   16 +++++++++++++---
>  1 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index 03601d2..fde6cb5 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -420,6 +420,8 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
>  		file->f_path.dentry->d_name.name,
>  		mapping->host->i_ino, len, (long long) pos);
>  
> +	pnfs_update_layout(mapping->host, NULL, NFS4_MAX_UINT64, 0, IOMODE_RW,
> +			   (struct pnfs_layout_segment **) fsdata);
>  start:
>  	/*
>  	 * Prevent starvation issues if someone is doing a consistency
> @@ -428,11 +430,13 @@ start:
>  	ret = wait_on_bit(&NFS_I(mapping->host)->flags, NFS_INO_FLUSHING,
>  			nfs_wait_bit_killable, TASK_KILLABLE);
>  	if (ret)
> -		return ret;
> +		goto out;
>  
>  	page = grab_cache_page_write_begin(mapping, index, flags);
> -	if (!page)
> -		return -ENOMEM;
> +	if (!page) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
>  	*pagep = page;
>  
>  	ret = nfs_flush_incompatible(file, page);
> @@ -447,6 +451,11 @@ start:
>  		if (!ret)
>  			goto start;
>  	}
> + out:
> +	if (ret) {
> +		put_lseg(*fsdata);
> +		*fsdata = NULL;
> +	}
>  	return ret;
>  }
>  
> @@ -486,6 +495,7 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,
>  
>  	unlock_page(page);
>  	page_cache_release(page);
> +	put_lseg(fsdata);
>  
>  	if (status < 0)
>  		return status;


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/24] pnfs_submit: use fsdata to pass lseg
  2010-06-09 10:38                             ` [PATCH 14/24] pnfs_submit: use fsdata to pass lseg Benny Halevy
@ 2010-06-09 12:08                               ` Fred Isaman
  2010-06-10 10:33                                 ` Fred Isaman
  0 siblings, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-09 12:08 UTC (permalink / raw
  To: Benny Halevy; +Cc: Fred Isaman, linux-nfs

On Wed, Jun 9, 2010 at 6:38 AM, Benny Halevy <bhalevy@panasas.com> wrot=
e:
> Fred, how does that patch interact with
> 285052f pnfs_post_submit: Restore "pnfs: pnfs_do_flush"
> and the latter patches that depend on it?
>
> Benny
>

They will have to be modified.  I'll look at that today.

=46red

> On Jun. 08, 2010, 7:19 +0300, Fred Isaman <iisaman@netapp.com> wrote:
>> Preparing for LAYUTGET invocation in nfs_write_begin to be the
>> only invocation in the write path.
>>
>> It isn't used at all yet, but it should be properly referenced/deref=
erenced
>>
>> Signed-off-by: Fred Isaman <iisaman@netapp.com>
>> ---
>> =A0fs/nfs/file.c | =A0 16 +++++++++++++---
>> =A01 files changed, 13 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
>> index 03601d2..fde6cb5 100644
>> --- a/fs/nfs/file.c
>> +++ b/fs/nfs/file.c
>> @@ -420,6 +420,8 @@ static int nfs_write_begin(struct file *file, st=
ruct address_space *mapping,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 file->f_path.dentry->d_name.name,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 mapping->host->i_ino, len, (long long) p=
os);
>>
>> + =A0 =A0 pnfs_update_layout(mapping->host, NULL, NFS4_MAX_UINT64, 0=
, IOMODE_RW,
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0(struct pnfs_layout=
_segment **) fsdata);
>> =A0start:
>> =A0 =A0 =A0 /*
>> =A0 =A0 =A0 =A0* Prevent starvation issues if someone is doing a con=
sistency
>> @@ -428,11 +430,13 @@ start:
>> =A0 =A0 =A0 ret =3D wait_on_bit(&NFS_I(mapping->host)->flags, NFS_IN=
O_FLUSHING,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 nfs_wait_bit_killable, T=
ASK_KILLABLE);
>> =A0 =A0 =A0 if (ret)
>> - =A0 =A0 =A0 =A0 =A0 =A0 return ret;
>> + =A0 =A0 =A0 =A0 =A0 =A0 goto out;
>>
>> =A0 =A0 =A0 page =3D grab_cache_page_write_begin(mapping, index, fla=
gs);
>> - =A0 =A0 if (!page)
>> - =A0 =A0 =A0 =A0 =A0 =A0 return -ENOMEM;
>> + =A0 =A0 if (!page) {
>> + =A0 =A0 =A0 =A0 =A0 =A0 ret =3D -ENOMEM;
>> + =A0 =A0 =A0 =A0 =A0 =A0 goto out;
>> + =A0 =A0 }
>> =A0 =A0 =A0 *pagep =3D page;
>>
>> =A0 =A0 =A0 ret =3D nfs_flush_incompatible(file, page);
>> @@ -447,6 +451,11 @@ start:
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (!ret)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto start;
>> =A0 =A0 =A0 }
>> + out:
>> + =A0 =A0 if (ret) {
>> + =A0 =A0 =A0 =A0 =A0 =A0 put_lseg(*fsdata);
>> + =A0 =A0 =A0 =A0 =A0 =A0 *fsdata =3D NULL;
>> + =A0 =A0 }
>> =A0 =A0 =A0 return ret;
>> =A0}
>>
>> @@ -486,6 +495,7 @@ static int nfs_write_end(struct file *file, stru=
ct address_space *mapping,
>>
>> =A0 =A0 =A0 unlock_page(page);
>> =A0 =A0 =A0 page_cache_release(page);
>> + =A0 =A0 put_lseg(fsdata);
>>
>> =A0 =A0 =A0 if (status < 0)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 return status;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 20/24] pnfs_submit: API change: remove pnfs_commit layoutget invocation
  2010-06-09  9:09                                         ` [PATCH 20/24] pnfs_submit: API change: remove pnfs_commit layoutget invocation Benny Halevy
@ 2010-06-09 12:21                                           ` Fred Isaman
  2010-06-09 15:12                                             ` Boaz Harrosh
  0 siblings, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-09 12:21 UTC (permalink / raw
  To: Benny Halevy; +Cc: Fred Isaman, linux-nfs

On Wed, Jun 9, 2010 at 5:09 AM, Benny Halevy <bhalevy@panasas.com> wrote:
> On Jun. 08, 2010, 7:19 +0300, Fred Isaman <iisaman@netapp.com> wrote:
>> WARNING - this is an API change.
>>
>> The layout driver's commit operation no longer takes an lseg.
>> This is because each nfs_page may or may not have an associated lseg.
>> It is the layout drivers task to send commits to the appropriate place.
>
> So if the appropriate place for all pages that have no lseg associated
> with them is the MDS why shouldn't the generic layer do that?
>
> Benny
>

Parceling out the commit to different servers is a pnfs requirement,
not a general code requirement.  While it is true that the general
layer knows how to send to the MDS, it does not know how to spilt the
COMMIT into pieces.

The filelayout driver already needs code to parcel out the commit to
different data servers.  Adding the MDS is trivial

The block driver is not affected, because any IO it is handling will
not need COMMIT, so it just sends back NOTATTEMPTED and lets
everything go to the MDS.

I glanced at the object code, and it also sends back NOTATTEMPTED, so
there should be no change to current behavior, as it again just sends
it back up to go through the MDS

Fred

>>
>> Signed-off-by: Fred Isaman <iisaman@netapp.com>
>> ---
>>  fs/nfs/internal.h        |    2 +-
>>  fs/nfs/pagelist.c        |    5 ++-
>>  fs/nfs/pnfs.c            |   79 ++++++++-------------------------------------
>>  fs/nfs/pnfs.h            |   21 +++++-------
>>  fs/nfs/write.c           |   23 +++++++------
>>  include/linux/nfs_page.h |    3 +-
>>  6 files changed, 43 insertions(+), 90 deletions(-)
>>
>> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
>> index b754446..a30974a 100644
>> --- a/fs/nfs/internal.h
>> +++ b/fs/nfs/internal.h
>> @@ -286,7 +286,7 @@ extern int nfs_initiate_commit(struct nfs_write_data *data,
>>  extern int pnfs_initiate_commit(struct nfs_write_data *data,
>>                              struct rpc_clnt *clnt,
>>                              const struct rpc_call_ops *call_ops,
>> -                            int how);
>> +                             int how, int pnfs);
>>  extern void nfs_write_prepare(struct rpc_task *task, void *calldata);
>>  extern void nfs_mark_list_commit(struct list_head *head);
>>  #ifdef CONFIG_MIGRATION
>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>> index c3e5a1f..c8de900 100644
>> --- a/fs/nfs/pagelist.c
>> +++ b/fs/nfs/pagelist.c
>> @@ -380,6 +380,7 @@ void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
>>   * @idx_start: lower bound of page->index to scan
>>   * @npages: idx_start + npages sets the upper bound to scan.
>>   * @tag: tag to scan for
>> + * @use_pnfs: will be set TRUE if commit needs to be handled by layout driver
>>   *
>>   * Moves elements from one of the inode request lists.
>>   * If the number of requests is set to 0, the entire address_space
>> @@ -389,7 +390,7 @@ void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
>>   */
>>  int nfs_scan_list(struct nfs_inode *nfsi,
>>               struct list_head *dst, pgoff_t idx_start,
>> -             unsigned int npages, int tag)
>> +               unsigned int npages, int tag, int *use_pnfs)
>>  {
>>       struct nfs_page *pgvec[NFS_SCAN_MAXENTRIES];
>>       struct nfs_page *req;
>> @@ -420,6 +421,8 @@ int nfs_scan_list(struct nfs_inode *nfsi,
>>                               radix_tree_tag_clear(&nfsi->nfs_page_tree,
>>                                               req->wb_index, tag);
>>                               nfs_list_add_request(req, dst);
>> +                             if (req->wb_lseg)
>> +                                     *use_pnfs = 1;
>>                               res++;
>>                               if (res == INT_MAX)
>>                                       goto out;
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index 4907e3a..9f28b28 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -1672,19 +1672,11 @@ enum pnfs_try_status
>>  _pnfs_try_to_commit(struct nfs_write_data *data,
>>                   const struct rpc_call_ops *call_ops, int how)
>>  {
>> -     struct inode *inode = data->inode;
>> -
>> -     if (!pnfs_enabled_sb(NFS_SERVER(inode))) {
>> -             dprintk("%s: Not using pNFS I/O\n", __func__);
>> -             return PNFS_NOT_ATTEMPTED;
>> -     } else {
>> -             /* data->call_ops and data->how set in nfs_commit_rpcsetup */
>> -             dprintk("%s: Utilizing pNFS I/O\n", __func__);
>> -             data->pdata.call_ops = call_ops;
>> -             data->pdata.pnfs_error = 0;
>> -             data->pdata.how = how;
>> -             return pnfs_commit(data, how);
>> -     }
>> +     dprintk("%s: Utilizing pNFS I/O\n", __func__);
>> +     data->pdata.call_ops = call_ops;
>> +     data->pdata.pnfs_error = 0;
>> +     data->pdata.how = how;
>> +     return pnfs_commit(data, how);
>>  }
>>
>>  /* pNFS Commit callback function for all layout drivers */
>> @@ -1705,76 +1697,33 @@ pnfs_commit_done(struct nfs_write_data *data)
>>               _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE,
>>                                   true);
>>               pnfs_initiate_commit(data, NFS_CLIENT(data->inode),
>> -                                  pdata->call_ops, pdata->how);
>> +                                  pdata->call_ops, pdata->how, 1);
>>       }
>>  }
>>
>>  static enum pnfs_try_status
>>  pnfs_commit(struct nfs_write_data *data, int sync)
>>  {
>> -     int result;
>>       struct nfs_inode *nfsi = NFS_I(data->inode);
>>       struct nfs_server *nfss = NFS_SERVER(data->inode);
>> -     struct pnfs_layout_segment *lseg;
>> -     struct nfs_page *first, *last, *p;
>> -     int npages;
>>       enum pnfs_try_status trypnfs;
>> -     u64 count;
>>
>>       dprintk("%s: Begin\n", __func__);
>>
>> -     /* If the layout driver doesn't define its own commit function
>> -      * use standard NFSv4 commit
>> -      */
>> -     first = last = nfs_list_entry(data->pages.next);
>> -     npages = 0;
>> -     list_for_each_entry(p, &data->pages, wb_list) {
>> -             last = p;
>> -             npages++;
>> -     }
>> -     /* COMMIT indicates the whole file with offset = count = 0
>> -      * whereas layout segments indicate whole file with offset = 0,
>> -      * count = NFS4_MAX_UINT64.
>> +     /* We need to account for possibility that
>> +      * each nfs_page can point to a different lseg (or be NULL).
>> +      * For the immediate case of whole-file-only layouts, we at
>> +      * least know there can be only a single lseg.
>> +      * We still have to account for the possibility of some being NULL.
>> +      * This will be done by passing the buck to the layout driver.
>>        */
>> -     count = ((npages - 1) << PAGE_CACHE_SHIFT) + first->wb_bytes +
>> -              (first != last) ? last->wb_bytes : 0;
>> -     if (first->wb_offset == 0 && count == 0)
>> -             count = NFS4_MAX_UINT64;
>> -
>> -     /* FIXME: we really ought to keep the layout segment that we used
>> -        to write the page around for committing it and never ask for a
>> -        new one.  If it was recalled we better commit the data first
>> -        before returning it, otherwise the data needs to be rewritten,
>> -        either with a new layout or to the MDS */
>> -     result = _pnfs_update_layout(data->inode,
>> -                                 NULL,
>> -                                 count,
>> -                                 first->wb_offset,
>> -                                 IOMODE_RW,
>> -                                 &lseg);
>> -     /* If no layout have been retrieved,
>> -      * use standard NFSv4 commit
>> -      */
>> -     if (result) {
>> -             dprintk("%s: Updating layout failed (%d), retry with NFS \n",
>> -                     __func__, result);
>> -             trypnfs = PNFS_NOT_ATTEMPTED;
>> -             goto out;
>> -     }
>> -
>> -     dprintk("%s: Calling layout driver commit\n", __func__);
>> +     data->pdata.lseg = NULL;
>>       if (!pnfs_use_rpc(nfss))
>>               data->pdata.pnfsflags |= PNFS_NO_RPC;
>> -     data->pdata.lseg = lseg;
>>       trypnfs = nfss->pnfs_curr_ld->ld_io_ops->commit(&nfsi->layout,
>>                                                       sync, data);
>> -     if (trypnfs == PNFS_NOT_ATTEMPTED) {
>> +     if (trypnfs == PNFS_NOT_ATTEMPTED)
>>               data->pdata.pnfsflags &= ~PNFS_NO_RPC;
>> -             data->pdata.lseg = NULL;
>> -             put_lseg(lseg);
>> -     }
>> -
>> -out:
>>       dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
>>       return trypnfs;
>>  }
>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>> index ea54210..e231ca3 100644
>> --- a/fs/nfs/pnfs.h
>> +++ b/fs/nfs/pnfs.h
>> @@ -140,21 +140,18 @@ pnfs_try_to_commit(struct nfs_write_data *data,
>>                  const struct rpc_call_ops *call_ops,
>>                  int how)
>>  {
>> -     struct inode *inode = data->inode;
>> -     struct nfs_server *nfss = NFS_SERVER(inode);
>>       enum pnfs_try_status ret;
>>
>> -     /* Note that we check for "write_pagelist" and not for "commit"
>> -        since if async writes were done and pages weren't marked as stable
>> -        the commit method MUST be defined by the LD */
>> -     /* FIXME: write_pagelist should probably be mandated */
>> -     if (PNFS_EXISTS_LDIO_OP(nfss, write_pagelist))
>> -             ret = _pnfs_try_to_commit(data, call_ops, how);
>> -     else
>> -             ret = PNFS_NOT_ATTEMPTED;
>> -
>> +     /* Unlike in pnfs_try_to_write_data and pnfs_try_to_read_data,
>> +      * we have no guarantee that all nfs_pages point to the same
>> +      * lseg.  However, if we reach here, we are guaranteed that at
>> +      * least one points to some lseg.
>> +      */
>> +     ret = _pnfs_try_to_commit(data, call_ops, how);
>>       if (ret == PNFS_ATTEMPTED)
>> -             nfs_inc_stats(inode, NFSIOS_PNFS_COMMIT);
>> +             nfs_inc_stats(data->inode, NFSIOS_PNFS_COMMIT);
>> +     else
>> +             _pnfs_clear_lseg_from_pages(&data->pages);
>>       return ret;
>>  }
>>
>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>> index 811c776..ebc9452 100644
>> --- a/fs/nfs/write.c
>> +++ b/fs/nfs/write.c
>> @@ -527,7 +527,7 @@ nfs_need_commit(struct nfs_inode *nfsi)
>>   * The requests are *not* checked to ensure that they form a contiguous set.
>>   */
>>  static int
>> -nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, unsigned int npages)
>> +nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, unsigned int npages, int *use_pnfs)
>>  {
>>       struct nfs_inode *nfsi = NFS_I(inode);
>>       int ret;
>> @@ -535,7 +535,8 @@ nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, u
>>       if (!nfs_need_commit(nfsi))
>>               return 0;
>>
>> -     ret = nfs_scan_list(nfsi, dst, idx_start, npages, NFS_PAGE_TAG_COMMIT);
>> +     ret = nfs_scan_list(nfsi, dst, idx_start, npages, NFS_PAGE_TAG_COMMIT,
>> +                         use_pnfs);
>>       if (ret > 0)
>>               nfsi->ncommit -= ret;
>>       if (nfs_need_commit(NFS_I(inode)))
>> @@ -1334,9 +1335,10 @@ EXPORT_SYMBOL(nfs_initiate_commit);
>>  int pnfs_initiate_commit(struct nfs_write_data *data,
>>                        struct rpc_clnt *clnt,
>>                        const struct rpc_call_ops *call_ops,
>> -                      int how)
>> +                      int how, int pnfs)
>>  {
>> -     if (pnfs_try_to_commit(data, &nfs_commit_ops, how) == PNFS_ATTEMPTED)
>> +     if (pnfs &&
>> +         (pnfs_try_to_commit(data, &nfs_commit_ops, how) == PNFS_ATTEMPTED))
>>               return pnfs_get_write_status(data);
>>
>>       return nfs_initiate_commit(data, clnt, &nfs_commit_ops, how);
>> @@ -1347,7 +1349,7 @@ int pnfs_initiate_commit(struct nfs_write_data *data,
>>   */
>>  static int nfs_commit_rpcsetup(struct list_head *head,
>>               struct nfs_write_data *data,
>> -             int how)
>> +             int how, int pnfs)
>>  {
>>       struct nfs_page *first = nfs_list_entry(head->next);
>>       struct inode *inode = first->wb_context->path.dentry->d_inode;
>> @@ -1374,7 +1376,7 @@ static int nfs_commit_rpcsetup(struct list_head *head,
>>       data->args.context = first->wb_context;  /* used by commit done */
>>
>>       return pnfs_initiate_commit(data, NFS_CLIENT(inode), &nfs_commit_ops,
>> -                                 how);
>> +                                 how, pnfs);
>>  }
>>
>>  /* Handle memory error during commit */
>> @@ -1398,7 +1400,7 @@ EXPORT_SYMBOL(nfs_mark_list_commit);
>>   * Commit dirty pages
>>   */
>>  static int
>> -nfs_commit_list(struct inode *inode, struct list_head *head, int how)
>> +nfs_commit_list(struct inode *inode, struct list_head *head, int how, int pnfs)
>>  {
>>       struct nfs_write_data   *data;
>>
>> @@ -1407,7 +1409,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how)
>>               goto out_bad;
>>
>>       /* Set up the argument struct */
>> -     return nfs_commit_rpcsetup(head, data, how);
>> +     return nfs_commit_rpcsetup(head, data, how, pnfs);
>>   out_bad:
>>       nfs_mark_list_commit(head);
>>       nfs_commit_clear_lock(NFS_I(inode));
>> @@ -1495,14 +1497,15 @@ static int nfs_commit_inode(struct inode *inode, int how)
>>       LIST_HEAD(head);
>>       int may_wait = how & FLUSH_SYNC;
>>       int res = 0;
>> +     int use_pnfs = 0;
>>
>>       if (!nfs_commit_set_lock(NFS_I(inode), may_wait))
>>               goto out_mark_dirty;
>>       spin_lock(&inode->i_lock);
>> -     res = nfs_scan_commit(inode, &head, 0, 0);
>> +     res = nfs_scan_commit(inode, &head, 0, 0, &use_pnfs);
>>       spin_unlock(&inode->i_lock);
>>       if (res) {
>> -             int error = nfs_commit_list(inode, &head, how);
>> +             int error = nfs_commit_list(inode, &head, how, use_pnfs);
>>               if (error < 0)
>>                       return error;
>>               if (may_wait) {
>> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
>> index 18a455c..06e5157 100644
>> --- a/include/linux/nfs_page.h
>> +++ b/include/linux/nfs_page.h
>> @@ -83,7 +83,8 @@ extern      void nfs_release_request(struct nfs_page *req);
>>
>>
>>  extern       int nfs_scan_list(struct nfs_inode *nfsi, struct list_head *dst,
>> -                       pgoff_t idx_start, unsigned int npages, int tag);
>> +                       pgoff_t idx_start, unsigned int npages, int tag,
>> +                       int *use_pnfs);
>>  extern       void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>>                            struct inode *inode,
>>                            int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int),
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 20/24] pnfs_submit: API change: remove pnfs_commit  layoutget invocation
  2010-06-09 12:21                                           ` Fred Isaman
@ 2010-06-09 15:12                                             ` Boaz Harrosh
  2010-06-09 15:15                                               ` [PATCH] FIXME: pnfs-obj: Short circuit the objlayout_commit to be a no-op Boaz Harrosh
  0 siblings, 1 reply; 46+ messages in thread
From: Boaz Harrosh @ 2010-06-09 15:12 UTC (permalink / raw
  To: Fred Isaman; +Cc: Benny Halevy, Fred Isaman, linux-nfs

On 06/09/2010 03:21 PM, Fred Isaman wrote:
> On Wed, Jun 9, 2010 at 5:09 AM, Benny Halevy <bhalevy@panasas.com> wrote:
>> On Jun. 08, 2010, 7:19 +0300, Fred Isaman <iisaman@netapp.com> wrote:
>>> WARNING - this is an API change.
>>>
>>> The layout driver's commit operation no longer takes an lseg.
>>> This is because each nfs_page may or may not have an associated lseg.
>>> It is the layout drivers task to send commits to the appropriate place.
>>
>> So if the appropriate place for all pages that have no lseg associated
>> with them is the MDS why shouldn't the generic layer do that?
>>
>> Benny
>>
> 
> Parceling out the commit to different servers is a pnfs requirement,
> not a general code requirement.  While it is true that the general
> layer knows how to send to the MDS, it does not know how to spilt the
> COMMIT into pieces.
> 
> The filelayout driver already needs code to parcel out the commit to
> different data servers.  Adding the MDS is trivial
> 
> The block driver is not affected, because any IO it is handling will
> not need COMMIT, so it just sends back NOTATTEMPTED and lets
> everything go to the MDS.
> 
> I glanced at the object code, and it also sends back NOTATTEMPTED, so
> there should be no change to current behavior, as it again just sends
> it back up to go through the MDS
> 

No, Benny that's that missing patch:
http://www.spinics.net/lists/linux-nfs/msg12817.html

The commit story starts at write_done for example in bl_end_par_io_write()
it does:
	wdata->verf.committed = NFS_FILE_SYNC;
(same in current obj code)

Which tells the generic layer  "commit not needed take care of pages".
The layoutdriver->commit() is never called. Since 2.6.34 there is a
bug in this path, and it's best to use the patch above.

Now if the lo does a:
	wdata->verf.committed = NFS_UNSTABLE; //NFS_FILE_SYNC;

Then the ->commit() is called.

If commit() returns PNFS_NOTATTEMPTED, that's broken

If commit() returns PNFS_ATTEMPTED, that's broken too unless the driver
eventually calls client_ops->nfs_commit_complete(wdata). Note that the
driver can't call nfs_commit_complete(wdata) from within commit(), it
will deadlock. It will have to schedule that call (See patch above)

Some points to note:
1. What ever the driver decides the generic layer needs to take care of
   pages either in nfs_commit_complete(wdata) or at write_complete() &&
   committed == NFS_FILE_SYNC.
   
   Filelayout is a violent layering violation touching pages like that.
   It should call the generic layer just like the other drivers. Which explains
   why we have bugs in the other drivers, which are not fixed.  

2. Block layer should not return committed = NFS_FILE_SYNC unless it's
   block devices are battery backed up and resilient to crashes like
   the Panasas OSDs. Otherwise they should issue a FLUSH command on commit
   since otherwise they violate the NFS protocol, in that 
   "client keeps data in cache until on disk, resilient to server crashes"
   And all that Commit dancing and hard stuff Trond was working on.

   Same thing in objlayout. There is an OSD attribute the driver must inspect
   which specifies if memory is volatile or not and should otherwise call
   OSD_FLUSH on commit. (On my todo list)

That's what happens now. Maybe Fred broke all the above, I'm only at reviewing
his 2nd patch, will send comments.

(Benny I'll post an update above patch as reply)
Boaz

> Fred
> 
>>>
>>> Signed-off-by: Fred Isaman <iisaman@netapp.com>
>>> ---
>>>  fs/nfs/internal.h        |    2 +-
>>>  fs/nfs/pagelist.c        |    5 ++-
>>>  fs/nfs/pnfs.c            |   79 ++++++++-------------------------------------
>>>  fs/nfs/pnfs.h            |   21 +++++-------
>>>  fs/nfs/write.c           |   23 +++++++------
>>>  include/linux/nfs_page.h |    3 +-
>>>  6 files changed, 43 insertions(+), 90 deletions(-)
>>>
>>> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
>>> index b754446..a30974a 100644
>>> --- a/fs/nfs/internal.h
>>> +++ b/fs/nfs/internal.h
>>> @@ -286,7 +286,7 @@ extern int nfs_initiate_commit(struct nfs_write_data *data,
>>>  extern int pnfs_initiate_commit(struct nfs_write_data *data,
>>>                              struct rpc_clnt *clnt,
>>>                              const struct rpc_call_ops *call_ops,
>>> -                            int how);
>>> +                             int how, int pnfs);
>>>  extern void nfs_write_prepare(struct rpc_task *task, void *calldata);
>>>  extern void nfs_mark_list_commit(struct list_head *head);
>>>  #ifdef CONFIG_MIGRATION
>>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>>> index c3e5a1f..c8de900 100644
>>> --- a/fs/nfs/pagelist.c
>>> +++ b/fs/nfs/pagelist.c
>>> @@ -380,6 +380,7 @@ void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
>>>   * @idx_start: lower bound of page->index to scan
>>>   * @npages: idx_start + npages sets the upper bound to scan.
>>>   * @tag: tag to scan for
>>> + * @use_pnfs: will be set TRUE if commit needs to be handled by layout driver
>>>   *
>>>   * Moves elements from one of the inode request lists.
>>>   * If the number of requests is set to 0, the entire address_space
>>> @@ -389,7 +390,7 @@ void nfs_pageio_cond_complete(struct nfs_pageio_descriptor *desc, pgoff_t index)
>>>   */
>>>  int nfs_scan_list(struct nfs_inode *nfsi,
>>>               struct list_head *dst, pgoff_t idx_start,
>>> -             unsigned int npages, int tag)
>>> +               unsigned int npages, int tag, int *use_pnfs)
>>>  {
>>>       struct nfs_page *pgvec[NFS_SCAN_MAXENTRIES];
>>>       struct nfs_page *req;
>>> @@ -420,6 +421,8 @@ int nfs_scan_list(struct nfs_inode *nfsi,
>>>                               radix_tree_tag_clear(&nfsi->nfs_page_tree,
>>>                                               req->wb_index, tag);
>>>                               nfs_list_add_request(req, dst);
>>> +                             if (req->wb_lseg)
>>> +                                     *use_pnfs = 1;
>>>                               res++;
>>>                               if (res == INT_MAX)
>>>                                       goto out;
>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>> index 4907e3a..9f28b28 100644
>>> --- a/fs/nfs/pnfs.c
>>> +++ b/fs/nfs/pnfs.c
>>> @@ -1672,19 +1672,11 @@ enum pnfs_try_status
>>>  _pnfs_try_to_commit(struct nfs_write_data *data,
>>>                   const struct rpc_call_ops *call_ops, int how)
>>>  {
>>> -     struct inode *inode = data->inode;
>>> -
>>> -     if (!pnfs_enabled_sb(NFS_SERVER(inode))) {
>>> -             dprintk("%s: Not using pNFS I/O\n", __func__);
>>> -             return PNFS_NOT_ATTEMPTED;
>>> -     } else {
>>> -             /* data->call_ops and data->how set in nfs_commit_rpcsetup */
>>> -             dprintk("%s: Utilizing pNFS I/O\n", __func__);
>>> -             data->pdata.call_ops = call_ops;
>>> -             data->pdata.pnfs_error = 0;
>>> -             data->pdata.how = how;
>>> -             return pnfs_commit(data, how);
>>> -     }
>>> +     dprintk("%s: Utilizing pNFS I/O\n", __func__);
>>> +     data->pdata.call_ops = call_ops;
>>> +     data->pdata.pnfs_error = 0;
>>> +     data->pdata.how = how;
>>> +     return pnfs_commit(data, how);
>>>  }
>>>
>>>  /* pNFS Commit callback function for all layout drivers */
>>> @@ -1705,76 +1697,33 @@ pnfs_commit_done(struct nfs_write_data *data)
>>>               _pnfs_return_layout(data->inode, &range, NULL, RETURN_FILE,
>>>                                   true);
>>>               pnfs_initiate_commit(data, NFS_CLIENT(data->inode),
>>> -                                  pdata->call_ops, pdata->how);
>>> +                                  pdata->call_ops, pdata->how, 1);
>>>       }
>>>  }
>>>
>>>  static enum pnfs_try_status
>>>  pnfs_commit(struct nfs_write_data *data, int sync)
>>>  {
>>> -     int result;
>>>       struct nfs_inode *nfsi = NFS_I(data->inode);
>>>       struct nfs_server *nfss = NFS_SERVER(data->inode);
>>> -     struct pnfs_layout_segment *lseg;
>>> -     struct nfs_page *first, *last, *p;
>>> -     int npages;
>>>       enum pnfs_try_status trypnfs;
>>> -     u64 count;
>>>
>>>       dprintk("%s: Begin\n", __func__);
>>>
>>> -     /* If the layout driver doesn't define its own commit function
>>> -      * use standard NFSv4 commit
>>> -      */
>>> -     first = last = nfs_list_entry(data->pages.next);
>>> -     npages = 0;
>>> -     list_for_each_entry(p, &data->pages, wb_list) {
>>> -             last = p;
>>> -             npages++;
>>> -     }
>>> -     /* COMMIT indicates the whole file with offset = count = 0
>>> -      * whereas layout segments indicate whole file with offset = 0,
>>> -      * count = NFS4_MAX_UINT64.
>>> +     /* We need to account for possibility that
>>> +      * each nfs_page can point to a different lseg (or be NULL).
>>> +      * For the immediate case of whole-file-only layouts, we at
>>> +      * least know there can be only a single lseg.
>>> +      * We still have to account for the possibility of some being NULL.
>>> +      * This will be done by passing the buck to the layout driver.
>>>        */
>>> -     count = ((npages - 1) << PAGE_CACHE_SHIFT) + first->wb_bytes +
>>> -              (first != last) ? last->wb_bytes : 0;
>>> -     if (first->wb_offset == 0 && count == 0)
>>> -             count = NFS4_MAX_UINT64;
>>> -
>>> -     /* FIXME: we really ought to keep the layout segment that we used
>>> -        to write the page around for committing it and never ask for a
>>> -        new one.  If it was recalled we better commit the data first
>>> -        before returning it, otherwise the data needs to be rewritten,
>>> -        either with a new layout or to the MDS */
>>> -     result = _pnfs_update_layout(data->inode,
>>> -                                 NULL,
>>> -                                 count,
>>> -                                 first->wb_offset,
>>> -                                 IOMODE_RW,
>>> -                                 &lseg);
>>> -     /* If no layout have been retrieved,
>>> -      * use standard NFSv4 commit
>>> -      */
>>> -     if (result) {
>>> -             dprintk("%s: Updating layout failed (%d), retry with NFS \n",
>>> -                     __func__, result);
>>> -             trypnfs = PNFS_NOT_ATTEMPTED;
>>> -             goto out;
>>> -     }
>>> -
>>> -     dprintk("%s: Calling layout driver commit\n", __func__);
>>> +     data->pdata.lseg = NULL;
>>>       if (!pnfs_use_rpc(nfss))
>>>               data->pdata.pnfsflags |= PNFS_NO_RPC;
>>> -     data->pdata.lseg = lseg;
>>>       trypnfs = nfss->pnfs_curr_ld->ld_io_ops->commit(&nfsi->layout,
>>>                                                       sync, data);
>>> -     if (trypnfs == PNFS_NOT_ATTEMPTED) {
>>> +     if (trypnfs == PNFS_NOT_ATTEMPTED)
>>>               data->pdata.pnfsflags &= ~PNFS_NO_RPC;
>>> -             data->pdata.lseg = NULL;
>>> -             put_lseg(lseg);
>>> -     }
>>> -
>>> -out:
>>>       dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
>>>       return trypnfs;
>>>  }
>>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>>> index ea54210..e231ca3 100644
>>> --- a/fs/nfs/pnfs.h
>>> +++ b/fs/nfs/pnfs.h
>>> @@ -140,21 +140,18 @@ pnfs_try_to_commit(struct nfs_write_data *data,
>>>                  const struct rpc_call_ops *call_ops,
>>>                  int how)
>>>  {
>>> -     struct inode *inode = data->inode;
>>> -     struct nfs_server *nfss = NFS_SERVER(inode);
>>>       enum pnfs_try_status ret;
>>>
>>> -     /* Note that we check for "write_pagelist" and not for "commit"
>>> -        since if async writes were done and pages weren't marked as stable
>>> -        the commit method MUST be defined by the LD */
>>> -     /* FIXME: write_pagelist should probably be mandated */
>>> -     if (PNFS_EXISTS_LDIO_OP(nfss, write_pagelist))
>>> -             ret = _pnfs_try_to_commit(data, call_ops, how);
>>> -     else
>>> -             ret = PNFS_NOT_ATTEMPTED;
>>> -
>>> +     /* Unlike in pnfs_try_to_write_data and pnfs_try_to_read_data,
>>> +      * we have no guarantee that all nfs_pages point to the same
>>> +      * lseg.  However, if we reach here, we are guaranteed that at
>>> +      * least one points to some lseg.
>>> +      */
>>> +     ret = _pnfs_try_to_commit(data, call_ops, how);
>>>       if (ret == PNFS_ATTEMPTED)
>>> -             nfs_inc_stats(inode, NFSIOS_PNFS_COMMIT);
>>> +             nfs_inc_stats(data->inode, NFSIOS_PNFS_COMMIT);
>>> +     else
>>> +             _pnfs_clear_lseg_from_pages(&data->pages);
>>>       return ret;
>>>  }
>>>
>>> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
>>> index 811c776..ebc9452 100644
>>> --- a/fs/nfs/write.c
>>> +++ b/fs/nfs/write.c
>>> @@ -527,7 +527,7 @@ nfs_need_commit(struct nfs_inode *nfsi)
>>>   * The requests are *not* checked to ensure that they form a contiguous set.
>>>   */
>>>  static int
>>> -nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, unsigned int npages)
>>> +nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, unsigned int npages, int *use_pnfs)
>>>  {
>>>       struct nfs_inode *nfsi = NFS_I(inode);
>>>       int ret;
>>> @@ -535,7 +535,8 @@ nfs_scan_commit(struct inode *inode, struct list_head *dst, pgoff_t idx_start, u
>>>       if (!nfs_need_commit(nfsi))
>>>               return 0;
>>>
>>> -     ret = nfs_scan_list(nfsi, dst, idx_start, npages, NFS_PAGE_TAG_COMMIT);
>>> +     ret = nfs_scan_list(nfsi, dst, idx_start, npages, NFS_PAGE_TAG_COMMIT,
>>> +                         use_pnfs);
>>>       if (ret > 0)
>>>               nfsi->ncommit -= ret;
>>>       if (nfs_need_commit(NFS_I(inode)))
>>> @@ -1334,9 +1335,10 @@ EXPORT_SYMBOL(nfs_initiate_commit);
>>>  int pnfs_initiate_commit(struct nfs_write_data *data,
>>>                        struct rpc_clnt *clnt,
>>>                        const struct rpc_call_ops *call_ops,
>>> -                      int how)
>>> +                      int how, int pnfs)
>>>  {
>>> -     if (pnfs_try_to_commit(data, &nfs_commit_ops, how) == PNFS_ATTEMPTED)
>>> +     if (pnfs &&
>>> +         (pnfs_try_to_commit(data, &nfs_commit_ops, how) == PNFS_ATTEMPTED))
>>>               return pnfs_get_write_status(data);
>>>
>>>       return nfs_initiate_commit(data, clnt, &nfs_commit_ops, how);
>>> @@ -1347,7 +1349,7 @@ int pnfs_initiate_commit(struct nfs_write_data *data,
>>>   */
>>>  static int nfs_commit_rpcsetup(struct list_head *head,
>>>               struct nfs_write_data *data,
>>> -             int how)
>>> +             int how, int pnfs)
>>>  {
>>>       struct nfs_page *first = nfs_list_entry(head->next);
>>>       struct inode *inode = first->wb_context->path.dentry->d_inode;
>>> @@ -1374,7 +1376,7 @@ static int nfs_commit_rpcsetup(struct list_head *head,
>>>       data->args.context = first->wb_context;  /* used by commit done */
>>>
>>>       return pnfs_initiate_commit(data, NFS_CLIENT(inode), &nfs_commit_ops,
>>> -                                 how);
>>> +                                 how, pnfs);
>>>  }
>>>
>>>  /* Handle memory error during commit */
>>> @@ -1398,7 +1400,7 @@ EXPORT_SYMBOL(nfs_mark_list_commit);
>>>   * Commit dirty pages
>>>   */
>>>  static int
>>> -nfs_commit_list(struct inode *inode, struct list_head *head, int how)
>>> +nfs_commit_list(struct inode *inode, struct list_head *head, int how, int pnfs)
>>>  {
>>>       struct nfs_write_data   *data;
>>>
>>> @@ -1407,7 +1409,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how)
>>>               goto out_bad;
>>>
>>>       /* Set up the argument struct */
>>> -     return nfs_commit_rpcsetup(head, data, how);
>>> +     return nfs_commit_rpcsetup(head, data, how, pnfs);
>>>   out_bad:
>>>       nfs_mark_list_commit(head);
>>>       nfs_commit_clear_lock(NFS_I(inode));
>>> @@ -1495,14 +1497,15 @@ static int nfs_commit_inode(struct inode *inode, int how)
>>>       LIST_HEAD(head);
>>>       int may_wait = how & FLUSH_SYNC;
>>>       int res = 0;
>>> +     int use_pnfs = 0;
>>>
>>>       if (!nfs_commit_set_lock(NFS_I(inode), may_wait))
>>>               goto out_mark_dirty;
>>>       spin_lock(&inode->i_lock);
>>> -     res = nfs_scan_commit(inode, &head, 0, 0);
>>> +     res = nfs_scan_commit(inode, &head, 0, 0, &use_pnfs);
>>>       spin_unlock(&inode->i_lock);
>>>       if (res) {
>>> -             int error = nfs_commit_list(inode, &head, how);
>>> +             int error = nfs_commit_list(inode, &head, how, use_pnfs);
>>>               if (error < 0)
>>>                       return error;
>>>               if (may_wait) {
>>> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
>>> index 18a455c..06e5157 100644
>>> --- a/include/linux/nfs_page.h
>>> +++ b/include/linux/nfs_page.h
>>> @@ -83,7 +83,8 @@ extern      void nfs_release_request(struct nfs_page *req);
>>>
>>>
>>>  extern       int nfs_scan_list(struct nfs_inode *nfsi, struct list_head *dst,
>>> -                       pgoff_t idx_start, unsigned int npages, int tag);
>>> +                       pgoff_t idx_start, unsigned int npages, int tag,
>>> +                       int *use_pnfs);
>>>  extern       void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>>>                            struct inode *inode,
>>>                            int (*doio)(struct inode *, struct list_head *, unsigned int, size_t, int),
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] FIXME: pnfs-obj: Short circuit the objlayout_commit to be a no-op
  2010-06-09 15:12                                             ` Boaz Harrosh
@ 2010-06-09 15:15                                               ` Boaz Harrosh
  0 siblings, 0 replies; 46+ messages in thread
From: Boaz Harrosh @ 2010-06-09 15:15 UTC (permalink / raw
  To: Fred Isaman; +Cc: Benny Halevy, Fred Isaman, linux-nfs


Do to a bug in generic client: Return of NFS_FILE_SYNC from
write_done will render the system useless. When returning
NFS_UNSTABLE the generic layer then returns with a call to
objlayout_commit. At the outpost a successful nfs_commit_complete()
should be called and PNFS_ATTEMPTED returned.

Since nfs_commit_complete cannot be called from within
objlayout_commit. It is scheduled on an rpc task to be called
asynchronously.

TODO:
  All this is good code, actually needed and missing from obio_osd.
  What's missing is the actual call to osd_flush() and the completion
  call on request return.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
---
 fs/nfs/objlayout/objio_osd.c |    2 +-
 fs/nfs/objlayout/objlayout.c |   17 ++++++++++++++++-
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/objlayout/objio_osd.c b/fs/nfs/objlayout/objio_osd.c
index 315f8c6..4e266a2 100644
--- a/fs/nfs/objlayout/objio_osd.c
+++ b/fs/nfs/objlayout/objio_osd.c
@@ -852,7 +852,7 @@ static ssize_t _write_done(struct objio_state *ios)
 	if (likely(!ret)) {
 		/* FIXME: should be based on the OSD's persistence model
 		 * See OSD2r05 Section 4.13 Data persistence model */
-		ios->ol_state.committed = NFS_FILE_SYNC;
+		ios->ol_state.committed = NFS_UNSTABLE; //NFS_FILE_SYNC;
 		status = ios->length;
 	} else {
 		status = ret;
diff --git a/fs/nfs/objlayout/objlayout.c b/fs/nfs/objlayout/objlayout.c
index 880d987..60f64b7 100644
--- a/fs/nfs/objlayout/objlayout.c
+++ b/fs/nfs/objlayout/objlayout.c
@@ -287,15 +287,30 @@ objlayout_io_set_result(struct objlayout_io_state *state, unsigned index,
 	}
 }
 
+static void _rpc_commit_complete(struct work_struct *work)
+{
+	struct rpc_task *task;
+	struct nfs_write_data *wdata;
+
+	dprintk("%s enter\n", __func__);
+	task = container_of(work, struct rpc_task, u.tk_work);
+	wdata = container_of(task, struct nfs_write_data, task);
+
+	pnfs_client_ops->nfs_commit_complete(wdata);
+}
+
 /*
  * Commit data remotely on OSDs
  */
 enum pnfs_try_status
 objlayout_commit(struct pnfs_layout_type *pnfslay,
 		 int sync,
-		 struct nfs_write_data *data)
+		 struct nfs_write_data *wdata)
 {
 	int status = PNFS_ATTEMPTED;
+
+	INIT_WORK(&wdata->task.u.tk_work, _rpc_commit_complete);
+	schedule_work(&wdata->task.u.tk_work);
 	dprintk("%s: Return %d\n", __func__, status);
 	return status;
 }
-- 
1.6.6.1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 01/24] Revert "pnfs-nonfilelayout: Prelim support for non-file layout O_DIRECT"
  2010-06-08  4:18 ` [PATCH 01/24] Revert "pnfs-nonfilelayout: Prelim support for non-file layout O_DIRECT" Fred Isaman
  2010-06-08  4:18   ` [PATCH 02/24] Revert "pnfs: Enable O_DIRECT write path." Fred Isaman
@ 2010-06-09 18:06   ` Boaz Harrosh
  1 sibling, 0 replies; 46+ messages in thread
From: Boaz Harrosh @ 2010-06-09 18:06 UTC (permalink / raw
  To: Fred Isaman; +Cc: linux-nfs

On 06/08/2010 07:18 AM, Fred Isaman wrote:
> This reverts commit 05277f5f5236462a11e7a20ebe9009449f8a463d.
> 
> Signed-off-by: Fred Isaman <iisaman@netapp.com>
> ---
>  fs/nfs/direct.c |   10 ----------
>  1 files changed, 0 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index e111e9f..02e5918 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -191,22 +191,12 @@ static ssize_t nfs_direct_wait(struct nfs_direct_req *dreq)
>  {
>  	ssize_t result = -EIOCBQUEUED;
>  
> -	if (!pnfs_use_rpc(NFS_SERVER(dreq->inode))) {
> -		/* FIXME: Right now non-rpc layout types must perform
> -		 * syncronous direct i/o.
> -		 * New pNFS callback to wait on outstanding requests?
> -		 */

Just a note for later.
The read/write_pages at layout driver have a sync_io flag (some_where)
Both objlayout and pan_shim fully implement it. Once xxx_pages returns
data is on disk. So no need for the "New pNFS callback to wait"

Boaz

> -		result = 0;
> -		goto set_result;
> -	}
> -
>  	/* Async requests don't wait here */
>  	if (dreq->iocb)
>  		goto out;
>  
>  	result = wait_for_completion_killable(&dreq->completion);
>  
> -set_result:
>  	if (!result)
>  		result = dreq->error;
>  	if (!result)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 05/24] SQUASHME: ensure pnfs_update_lseg clears lsegp on error
  2010-06-08  4:19         ` [PATCH 05/24] SQUASHME: ensure pnfs_update_lseg clears lsegp on error Fred Isaman
  2010-06-08  4:19           ` [PATCH 06/24] pnfs: filelayout: clean and breakup nfs4_pnfs_dserver_get Fred Isaman
@ 2010-06-09 18:18           ` Boaz Harrosh
  1 sibling, 0 replies; 46+ messages in thread
From: Boaz Harrosh @ 2010-06-09 18:18 UTC (permalink / raw
  To: Fred Isaman; +Cc: linux-nfs

On 06/08/2010 07:19 AM, Fred Isaman wrote:
> This should be squashed into my (or alexandros's)submission patches for
> version 2. Compensate for Alexandros returning error but assigning lseg.
> 
> Signed-off-by: Fred Isaman <iisaman@netapp.com>
> ---
>  fs/nfs/pnfs.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 2f8fa3c..b990471 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -1064,6 +1064,8 @@ pnfs_update_layout(struct inode *ino,
>  	DEFINE_WAIT(__wait);
>  	int result = 0;
>  
> +	if (take_ref)
> +		*lsegpp = NULL;
>  	lo = get_lock_alloc_layout(ino);
>  	if (IS_ERR(lo)) {
>  		dprintk("%s ERROR: can't get pnfs_layout_type\n", __func__);
> @@ -1078,6 +1080,7 @@ pnfs_update_layout(struct inode *ino,
>  			put_lseg(lseg);
>  
>  		/* someone is cleaning the layout */
> +		lseg = NULL;
>  		result = -EAGAIN;
>  		goto out_put;
>  	}

Please get these patchset organised.
* Cleanup which go in permanently - first / separate.
* Degrading B to A patches - later / separate.
  Plus the A to B patch if needed or a note that a simple patch -R will do.
  (Or a TODO: note that an A to B patch is missing)

This kind of patch kills any ability to really see what is happening.

And, as far as code progression and understanding goes:
* Fred's patches should go first - We first get a layout.
* Then Alexandros patches ontop. - Then we return / recall a layout.

Boaz

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/24] pnfs_submit: expose pnfs_update_layout, put_lseg, and get_lseg functions
  2010-06-08  4:19                     ` [PATCH 11/24] pnfs_submit: expose pnfs_update_layout, put_lseg, and get_lseg functions Fred Isaman
  2010-06-08  4:19                       ` [PATCH 12/24] pnfs_submit: stash and refcount lseg in read path Fred Isaman
@ 2010-06-09 18:58                       ` Boaz Harrosh
  2010-06-09 19:20                         ` Fred Isaman
  1 sibling, 1 reply; 46+ messages in thread
From: Boaz Harrosh @ 2010-06-09 18:58 UTC (permalink / raw
  To: Fred Isaman; +Cc: linux-nfs

On 06/08/2010 07:19 AM, Fred Isaman wrote:
> These will be used in the generic code.  Set so they will compile away to
> nothing if CONFIG_NFS_V4_1 not set.
> 
> This requires kref_put to be under lock.  See rule 3 of Documentation/kref.txt
> 

I don't see "rule 3" in here. Please explain how?

BTW: Even "rule 3" bad example with the lists, have a counter example in the Kernel
     with lists and searches that kref_put/get lockless. (By each element refing
     it's pear and taking the reference of the first one before search)

> Signed-off-by: Fred Isaman <iisaman@netapp.com>
> ---
>  fs/nfs/pnfs.c |   45 ++++++++++++++++++++++++++++++++-------------
>  fs/nfs/pnfs.h |   44 +++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 75 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 836cb0f..a74a4b6 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -436,7 +436,25 @@ destroy_lseg(struct kref *kref)
>  	PNFS_LD_IO_OPS(lseg->layout)->free_lseg(lseg);
>  }
>  
> -static inline void
> +static void
> +put_lseg_locked(struct pnfs_layout_segment *lseg)
> +{
> +	bool do_wake_up;
> +	struct nfs_inode *nfsi;
> +
> +	if (!lseg)
> +		return;
> +
> +	dprintk("%s: lseg %p ref %d valid %d\n", __func__, lseg,
> +		atomic_read(&lseg->kref.refcount), lseg->valid);
> +	do_wake_up = !lseg->valid;
> +	nfsi = PNFS_NFS_INODE(lseg->layout);
> +	kref_put(&lseg->kref, destroy_lseg);
> +	if (do_wake_up)
> +		wake_up(&nfsi->lo_waitq);
> +}
> +
> +void
>  put_lseg(struct pnfs_layout_segment *lseg)
>  {
>  	bool do_wake_up;
> @@ -449,7 +467,9 @@ put_lseg(struct pnfs_layout_segment *lseg)
>  		atomic_read(&lseg->kref.refcount), lseg->valid);
>  	do_wake_up = !lseg->valid;
>  	nfsi = PNFS_NFS_INODE(lseg->layout);
> +	lock_current_layout(nfsi);
>  	kref_put(&lseg->kref, destroy_lseg);
> +	unlock_current_layout(nfsi);
>  	if (do_wake_up)
>  		wake_up(&nfsi->lo_waitq);
>  }
> @@ -674,7 +694,7 @@ pnfs_free_layout(struct pnfs_layout_type *lo,
>  			lseg, lseg->range.iomode, lseg->range.offset,
>  			lseg->range.length);
>  		list_del(&lseg->fi_list);
> -		put_lseg(lseg);
> +		put_lseg_locked(lseg);
>  	}
>  
>  	dprintk("%s:Return\n", __func__);
> @@ -1033,7 +1053,7 @@ pnfs_has_layout(struct pnfs_layout_type *lo,
>  		    (lseg->valid || !only_valid)) {
>  			ret = lseg;
>  			if (take_ref)
> -				kref_get(&ret->kref);
> +				get_lseg(ret);
>  			break;
>  		}
>  		if (cmp_layout(range, &lseg->range) > 0)
> @@ -1053,7 +1073,7 @@ pnfs_has_layout(struct pnfs_layout_type *lo,
>   * returned to the caller.
>   */
>  int
> -pnfs_update_layout(struct inode *ino,
> +_pnfs_update_layout(struct inode *ino,
>  		   struct nfs_open_context *ctx,
>  		   u64 count,
>  		   loff_t pos,
> @@ -1085,8 +1105,7 @@ pnfs_update_layout(struct inode *ino,
>  	lseg = pnfs_has_layout(lo, &arg, take_ref, !take_ref);
>  	if (lseg && !lseg->valid) {
>  		if (take_ref)
> -			put_lseg(lseg);
> -
> +			put_lseg_locked(lseg);
>  		/* someone is cleaning the layout */
>  		lseg = NULL;
>  		result = -EAGAIN;
> @@ -1262,7 +1281,7 @@ pnfs_layout_process(struct nfs4_pnfs_layoutget *lgp)
>  	init_lseg(lo, lseg);
>  	lseg->range = res->lseg;
>  	if (lgp->lsegpp) {
> -		kref_get(&lseg->kref);
> +		get_lseg(lseg);
>  		*lgp->lsegpp = lseg;
>  	}
>  
> @@ -1380,7 +1399,7 @@ pnfs_pageio_init_read(struct nfs_pageio_descriptor *pgio,
>  	readahead_range(inode, pages, &loff, &count);
>  
>  	if (count > 0) {
> -		status = pnfs_update_layout(inode, ctx, count,
> +		status = _pnfs_update_layout(inode, ctx, count,
>  						loff, IOMODE_READ, NULL);
>  		dprintk("%s virt update returned %d\n", __func__, status);
>  		if (status != 0)
> @@ -1438,7 +1457,7 @@ pnfs_update_layout_commit(struct inode *inode,
>  	if (start == 0 && count == 0)
>  		count = NFS4_MAX_UINT64;
>  
> -	status = pnfs_update_layout(inode, nfs_page->wb_context,
> +	status = _pnfs_update_layout(inode, nfs_page->wb_context,
>  				count,
>  				start,
>  				IOMODE_RW,
> @@ -1538,7 +1557,7 @@ pnfs_file_write(struct file *filp, const char __user *buf, size_t count,
>  		goto out;
>  
>  	/* Retrieve and set layout if not allready cached */
> -	status = pnfs_update_layout(inode,
> +	status = _pnfs_update_layout(inode,
>  				    context,
>  				    count,
>  				    *pos,
> @@ -1580,7 +1599,7 @@ pnfs_writepages(struct nfs_write_data *wdata, int how)
>  		args->offset);
>  
>  	/* Retrieve and set layout if not allready cached */
> -	status = pnfs_update_layout(inode,
> +	status = _pnfs_update_layout(inode,
>  				    args->context,
>  				    args->count,
>  				    args->offset,
> @@ -1681,7 +1700,7 @@ pnfs_readpages(struct nfs_read_data *rdata)
>  		args->offset);
>  
>  	/* Retrieve and set layout if not allready cached */
> -	status = pnfs_update_layout(inode,
> +	status = _pnfs_update_layout(inode,
>  				    args->context,
>  				    args->count,
>  				    args->offset,
> @@ -1845,7 +1864,7 @@ pnfs_commit(struct nfs_write_data *data, int sync)
>  	   new one.  If it was recalled we better commit the data first
>  	   before returning it, otherwise the data needs to be rewritten,
>  	   either with a new layout or to the MDS */
> -	result = pnfs_update_layout(data->inode,
> +	result = _pnfs_update_layout(data->inode,
>  				    NULL,
>  				    count,
>  				    first->wb_offset,
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index 214d567..6326ed5 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -31,7 +31,8 @@ extern int pnfs4_proc_layoutreturn(struct nfs4_pnfs_layoutreturn *lrp, bool wait
>  /* pnfs.c */
>  extern const nfs4_stateid zero_stateid;
>  
> -int pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
> +void put_lseg(struct pnfs_layout_segment *lseg);
> +int _pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
>  	u64 count, loff_t pos, enum pnfs_iomode access_type,
>  	struct pnfs_layout_segment **lsegpp);
>  
> @@ -81,6 +82,12 @@ static inline int lo_fail_bit(u32 iomode)
>  			 NFS_INO_RW_LAYOUT_FAILED : NFS_INO_RO_LAYOUT_FAILED;
>  }
>  
> +static inline void get_lseg(struct pnfs_layout_segment *lseg)
> +{
> +	if (lseg)

Really? That in my experience is a shoot in the foot.

I don't believe any code that decided to get an lseg could get there without one.
if so I want to crash.

>From all instances of  get_lseg in this patch they already ask.

> +		kref_get(&lseg->kref);
> +}
> +
>  /* Return true if a layout driver is being used for this mountpoint */
>  static inline int pnfs_enabled_sb(struct nfs_server *nfss)
>  {
> @@ -170,6 +177,23 @@ static inline int pnfs_return_layout(struct inode *ino,
>  	return 0;
>  }
>  
> +static inline int pnfs_update_layout(struct inode *ino,
> +	struct nfs_open_context *ctx,
> +	u64 count, loff_t pos, enum pnfs_iomode access_type,
> +	struct pnfs_layout_segment **lsegpp)
> +{
> +	struct nfs_server *nfss = NFS_SERVER(ino);
> +
> +	if (pnfs_enabled_sb(nfss))
> +		return _pnfs_update_layout(ino, ctx, count, pos,
> +					   access_type, lsegpp);
> +	else {
> +		if (lsegpp)
> +			*lsegpp = NULL;
> +		return 0;
> +	}
> +}
> +
>  static inline int pnfs_get_write_status(struct nfs_write_data *data)
>  {
>  	return data->pdata.pnfs_error;
> @@ -190,6 +214,24 @@ static inline int pnfs_use_rpc(struct nfs_server *nfss)
>  
>  #else  /* CONFIG_NFS_V4_1 */
>  
> +static inline void get_lseg(struct pnfs_layout_segment *lseg)
> +{
> +}
> +
> +static inline void put_lseg(struct pnfs_layout_segment *lseg)
> +{
> +}
> +
> +static inline int
> +pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
> +	u64 count, loff_t pos, enum pnfs_iomode access_type,
> +	struct pnfs_layout_segment **lsegpp)
> +{
> +	if (lsegpp)
> +		*lsegpp = NULL;
> +	return 0;
> +}
> +
>  static inline enum pnfs_try_status
>  pnfs_try_to_read_data(struct nfs_read_data *data,
>  		      const struct rpc_call_ops *call_ops)

Comments aside. Good needed stuff
Boaz

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 13/24] pnfs_submit: read path changeover
  2010-06-08  4:19                         ` [PATCH 13/24] pnfs_submit: read path changeover Fred Isaman
  2010-06-08  4:19                           ` [PATCH 14/24] pnfs_submit: use fsdata to pass lseg Fred Isaman
@ 2010-06-09 19:19                           ` Boaz Harrosh
  2010-06-09 19:29                             ` Fred Isaman
  1 sibling, 1 reply; 46+ messages in thread
From: Boaz Harrosh @ 2010-06-09 19:19 UTC (permalink / raw
  To: Fred Isaman; +Cc: linux-nfs

On 06/08/2010 07:19 AM, Fred Isaman wrote:
> Change readpages path to only call LAYOUTGET once.
> 
> Signed-off-by: Fred Isaman <iisaman@netapp.com>
> ---
>  fs/nfs/pagelist.c |    2 ++
>  fs/nfs/pnfs.c     |   37 +++++++------------------------------
>  fs/nfs/pnfs.h     |   25 ++++++++++++++++---------
>  3 files changed, 25 insertions(+), 39 deletions(-)
> 
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index ed647b9..c3e5a1f 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -253,6 +253,8 @@ static int nfs_can_coalesce_requests(struct nfs_page *prev,
>  		return 0;
>  	if (prev->wb_pgbase + prev->wb_bytes != PAGE_CACHE_SIZE)
>  		return 0;
> +	if (req->wb_lseg != prev->wb_lseg)
> +		return 0;
>  #ifdef CONFIG_NFS_V4_1
>  	if (pgio->pg_test && !pgio->pg_test(pgio, prev, req))
>  		return 0;
> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
> index 2b5f6fc..692a18e 100644
> --- a/fs/nfs/pnfs.c
> +++ b/fs/nfs/pnfs.c
> @@ -1689,7 +1689,7 @@ pnfs_readpages(struct nfs_read_data *rdata)
>  {
>  	struct nfs_readargs *args = &rdata->args;
>  	struct inode *inode = rdata->inode;
> -	int numpages, status, pgcount, temp;
> +	int numpages, pgcount, temp;
>  	struct nfs_server *nfss = NFS_SERVER(inode);
>  	struct nfs_inode *nfsi = NFS_I(inode);
>  	struct pnfs_layout_segment *lseg;
> @@ -1701,19 +1701,8 @@ pnfs_readpages(struct nfs_read_data *rdata)
>  		args->count,
>  		args->offset);
>  
> -	/* Retrieve and set layout if not allready cached */
> -	status = _pnfs_update_layout(inode,
> -				    args->context,
> -				    args->count,
> -				    args->offset,
> -				    IOMODE_READ,
> -				    &lseg);
> -	if (status) {
> -		dprintk("%s: Updating layout failed (%d), retry with NFS \n",
> -			__func__, status);
> -		trypnfs = PNFS_NOT_ATTEMPTED;
> -		goto out;
> -	}
> +	lseg = rdata->req->wb_lseg;
> +	get_lseg(lseg);
>  
>  	/* Determine number of pages. */
>  	pgcount = args->pgbase + args->count;
> @@ -1740,7 +1729,6 @@ pnfs_readpages(struct nfs_read_data *rdata)
>  		rdata->pdata.lseg = NULL;
>  		put_lseg(lseg);
>  	}
> - out:
>  	dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
>  	return trypnfs;
>  }
> @@ -1749,21 +1737,10 @@ enum pnfs_try_status
>  _pnfs_try_to_read_data(struct nfs_read_data *data,
>  		       const struct rpc_call_ops *call_ops)
>  {
> -	struct inode *ino = data->inode;
> -	struct nfs_server *nfss = NFS_SERVER(ino);
> -
> -	dprintk("--> %s\n", __func__);
> -	/* Only create an rpc request if utilizing NFSv4 I/O */
> -	if (!pnfs_enabled_sb(nfss) ||
> -	    !nfss->pnfs_curr_ld->ld_io_ops->read_pagelist) {
> -		dprintk("<-- %s: not using pnfs\n", __func__);
> -		return PNFS_NOT_ATTEMPTED;
> -	} else {
> -		dprintk("%s: Utilizing pNFS I/O\n", __func__);
> -		data->pdata.call_ops = call_ops;
> -		data->pdata.pnfs_error = 0;
> -		return pnfs_readpages(data);
> -	}

Wahoo, nice stuff Ha

By now this can go into it's only caller. 
(And caller de-inlined, what was that all about)

> +	dprintk("%s: Utilizing pNFS I/O\n", __func__);
> +	data->pdata.call_ops = call_ops;
> +	data->pdata.pnfs_error = 0;
> +	return pnfs_readpages(data);
>  }
>  
>  enum pnfs_try_status
> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
> index 6326ed5..816ebe1 100644
> --- a/fs/nfs/pnfs.h
> +++ b/fs/nfs/pnfs.h
> @@ -94,22 +94,29 @@ static inline int pnfs_enabled_sb(struct nfs_server *nfss)
>  	return nfss->pnfs_curr_ld != NULL;
>  }
>  
> +static inline void _pnfs_clear_lseg_from_pages(struct list_head *head)
> +{
> +	struct nfs_page *req;
> +
> +	list_for_each_entry(req, head, wb_list) {
> +		put_lseg(req->wb_lseg);
> +		req->wb_lseg = NULL;
> +	}
> +}
> +
>  static inline enum pnfs_try_status
>  pnfs_try_to_read_data(struct nfs_read_data *data,
>  		      const struct rpc_call_ops *call_ops)

Don't think this needs to be inline, whats the point?

>  {
> -	struct inode *inode = data->inode;
> -	struct nfs_server *nfss = NFS_SERVER(inode);
>  	enum pnfs_try_status ret;
>  
> -	/* FIXME: read_pagelist should probably be mandated */
> -	if (PNFS_EXISTS_LDIO_OP(nfss, read_pagelist))
> -		ret = _pnfs_try_to_read_data(data, call_ops);
> -	else
> -		ret = PNFS_NOT_ATTEMPTED;
> -
> +	if (!data->req->wb_lseg)
> +		return PNFS_NOT_ATTEMPTED;
> +	ret = _pnfs_try_to_read_data(data, call_ops);
>  	if (ret == PNFS_ATTEMPTED)
> -		nfs_inc_stats(inode, NFSIOS_PNFS_READ);
> +		nfs_inc_stats(data->inode, NFSIOS_PNFS_READ);
> +	else
> +		_pnfs_clear_lseg_from_pages(&data->pages);
>  	return ret;
>  }
>  

Boaz

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/24] pnfs_submit: expose pnfs_update_layout, put_lseg, and get_lseg functions
  2010-06-09 18:58                       ` [PATCH 11/24] pnfs_submit: expose pnfs_update_layout, put_lseg, and get_lseg functions Boaz Harrosh
@ 2010-06-09 19:20                         ` Fred Isaman
  0 siblings, 0 replies; 46+ messages in thread
From: Fred Isaman @ 2010-06-09 19:20 UTC (permalink / raw
  To: Boaz Harrosh; +Cc: Fred Isaman, linux-nfs

On Wed, Jun 9, 2010 at 2:58 PM, Boaz Harrosh <bharrosh@panasas.com> wro=
te:
> On 06/08/2010 07:19 AM, Fred Isaman wrote:
>> These will be used in the generic code. =A0Set so they will compile =
away to
>> nothing if CONFIG_NFS_V4_1 not set.
>>
>> This requires kref_put to be under lock. =A0See rule 3 of Documentat=
ion/kref.txt
>>
>
> I don't see "rule 3" in here. Please explain how?

3) If the code attempts to gain a reference to a kref-ed structure
   without already holding a valid pointer, it must serialize access
   where a kref_put() cannot occur during the kref_get(), and the
   structure must remain valid during the kref_get().

This occurs every time we call pnfs_update_layout


>
> BTW: Even "rule 3" bad example with the lists, have a counter example=
 in the Kernel
> =A0 =A0 with lists and searches that kref_put/get lockless. (By each =
element refing
> =A0 =A0 it's pear and taking the reference of the first one before se=
arch)
>

I don't follow this.

>> Signed-off-by: Fred Isaman <iisaman@netapp.com>
>> ---
>> =A0fs/nfs/pnfs.c | =A0 45 ++++++++++++++++++++++++++++++++----------=
---
>> =A0fs/nfs/pnfs.h | =A0 44 ++++++++++++++++++++++++++++++++++++++++++=
+-
>> =A02 files changed, 75 insertions(+), 14 deletions(-)
>>
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index 836cb0f..a74a4b6 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -436,7 +436,25 @@ destroy_lseg(struct kref *kref)
>> =A0 =A0 =A0 PNFS_LD_IO_OPS(lseg->layout)->free_lseg(lseg);
>> =A0}
>>
>> -static inline void
>> +static void
>> +put_lseg_locked(struct pnfs_layout_segment *lseg)
>> +{
>> + =A0 =A0 bool do_wake_up;
>> + =A0 =A0 struct nfs_inode *nfsi;
>> +
>> + =A0 =A0 if (!lseg)
>> + =A0 =A0 =A0 =A0 =A0 =A0 return;
>> +
>> + =A0 =A0 dprintk("%s: lseg %p ref %d valid %d\n", __func__, lseg,
>> + =A0 =A0 =A0 =A0 =A0 =A0 atomic_read(&lseg->kref.refcount), lseg->v=
alid);
>> + =A0 =A0 do_wake_up =3D !lseg->valid;
>> + =A0 =A0 nfsi =3D PNFS_NFS_INODE(lseg->layout);
>> + =A0 =A0 kref_put(&lseg->kref, destroy_lseg);
>> + =A0 =A0 if (do_wake_up)
>> + =A0 =A0 =A0 =A0 =A0 =A0 wake_up(&nfsi->lo_waitq);
>> +}
>> +
>> +void
>> =A0put_lseg(struct pnfs_layout_segment *lseg)
>> =A0{
>> =A0 =A0 =A0 bool do_wake_up;
>> @@ -449,7 +467,9 @@ put_lseg(struct pnfs_layout_segment *lseg)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 atomic_read(&lseg->kref.refcount), lseg-=
>valid);
>> =A0 =A0 =A0 do_wake_up =3D !lseg->valid;
>> =A0 =A0 =A0 nfsi =3D PNFS_NFS_INODE(lseg->layout);
>> + =A0 =A0 lock_current_layout(nfsi);
>> =A0 =A0 =A0 kref_put(&lseg->kref, destroy_lseg);
>> + =A0 =A0 unlock_current_layout(nfsi);
>> =A0 =A0 =A0 if (do_wake_up)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 wake_up(&nfsi->lo_waitq);
>> =A0}
>> @@ -674,7 +694,7 @@ pnfs_free_layout(struct pnfs_layout_type *lo,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 lseg, lseg->range.iomode=
, lseg->range.offset,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 lseg->range.length);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 list_del(&lseg->fi_list);
>> - =A0 =A0 =A0 =A0 =A0 =A0 put_lseg(lseg);
>> + =A0 =A0 =A0 =A0 =A0 =A0 put_lseg_locked(lseg);
>> =A0 =A0 =A0 }
>>
>> =A0 =A0 =A0 dprintk("%s:Return\n", __func__);
>> @@ -1033,7 +1053,7 @@ pnfs_has_layout(struct pnfs_layout_type *lo,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (lseg->valid || !only_valid)) {
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ret =3D lseg;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (take_ref)
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 kref_get(&=
ret->kref);
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 get_lseg(r=
et);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (cmp_layout(range, &lseg->range) > 0)
>> @@ -1053,7 +1073,7 @@ pnfs_has_layout(struct pnfs_layout_type *lo,
>> =A0 * returned to the caller.
>> =A0 */
>> =A0int
>> -pnfs_update_layout(struct inode *ino,
>> +_pnfs_update_layout(struct inode *ino,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0struct nfs_open_context *ctx,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0u64 count,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0loff_t pos,
>> @@ -1085,8 +1105,7 @@ pnfs_update_layout(struct inode *ino,
>> =A0 =A0 =A0 lseg =3D pnfs_has_layout(lo, &arg, take_ref, !take_ref);
>> =A0 =A0 =A0 if (lseg && !lseg->valid) {
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (take_ref)
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 put_lseg(lseg);
>> -
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 put_lseg_locked(lseg);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* someone is cleaning the layout */
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 lseg =3D NULL;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 result =3D -EAGAIN;
>> @@ -1262,7 +1281,7 @@ pnfs_layout_process(struct nfs4_pnfs_layoutget=
 *lgp)
>> =A0 =A0 =A0 init_lseg(lo, lseg);
>> =A0 =A0 =A0 lseg->range =3D res->lseg;
>> =A0 =A0 =A0 if (lgp->lsegpp) {
>> - =A0 =A0 =A0 =A0 =A0 =A0 kref_get(&lseg->kref);
>> + =A0 =A0 =A0 =A0 =A0 =A0 get_lseg(lseg);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 *lgp->lsegpp =3D lseg;
>> =A0 =A0 =A0 }
>>
>> @@ -1380,7 +1399,7 @@ pnfs_pageio_init_read(struct nfs_pageio_descri=
ptor *pgio,
>> =A0 =A0 =A0 readahead_range(inode, pages, &loff, &count);
>>
>> =A0 =A0 =A0 if (count > 0) {
>> - =A0 =A0 =A0 =A0 =A0 =A0 status =3D pnfs_update_layout(inode, ctx, =
count,
>> + =A0 =A0 =A0 =A0 =A0 =A0 status =3D _pnfs_update_layout(inode, ctx,=
 count,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 loff, IOMODE_READ, NULL);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 dprintk("%s virt update returned %d\n", =
__func__, status);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (status !=3D 0)
>> @@ -1438,7 +1457,7 @@ pnfs_update_layout_commit(struct inode *inode,
>> =A0 =A0 =A0 if (start =3D=3D 0 && count =3D=3D 0)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 count =3D NFS4_MAX_UINT64;
>>
>> - =A0 =A0 status =3D pnfs_update_layout(inode, nfs_page->wb_context,
>> + =A0 =A0 status =3D _pnfs_update_layout(inode, nfs_page->wb_context=
,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 count,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 start,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 IOMODE_R=
W,
>> @@ -1538,7 +1557,7 @@ pnfs_file_write(struct file *filp, const char =
__user *buf, size_t count,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 goto out;
>>
>> =A0 =A0 =A0 /* Retrieve and set layout if not allready cached */
>> - =A0 =A0 status =3D pnfs_update_layout(inode,
>> + =A0 =A0 status =3D _pnfs_update_layout(inode,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
context,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
count,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
*pos,
>> @@ -1580,7 +1599,7 @@ pnfs_writepages(struct nfs_write_data *wdata, =
int how)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 args->offset);
>>
>> =A0 =A0 =A0 /* Retrieve and set layout if not allready cached */
>> - =A0 =A0 status =3D pnfs_update_layout(inode,
>> + =A0 =A0 status =3D _pnfs_update_layout(inode,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
args->context,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
args->count,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
args->offset,
>> @@ -1681,7 +1700,7 @@ pnfs_readpages(struct nfs_read_data *rdata)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 args->offset);
>>
>> =A0 =A0 =A0 /* Retrieve and set layout if not allready cached */
>> - =A0 =A0 status =3D pnfs_update_layout(inode,
>> + =A0 =A0 status =3D _pnfs_update_layout(inode,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
args->context,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
args->count,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
args->offset,
>> @@ -1845,7 +1864,7 @@ pnfs_commit(struct nfs_write_data *data, int s=
ync)
>> =A0 =A0 =A0 =A0 =A0new one. =A0If it was recalled we better commit t=
he data first
>> =A0 =A0 =A0 =A0 =A0before returning it, otherwise the data needs to =
be rewritten,
>> =A0 =A0 =A0 =A0 =A0either with a new layout or to the MDS */
>> - =A0 =A0 result =3D pnfs_update_layout(data->inode,
>> + =A0 =A0 result =3D _pnfs_update_layout(data->inode,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
NULL,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
count,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
first->wb_offset,
>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>> index 214d567..6326ed5 100644
>> --- a/fs/nfs/pnfs.h
>> +++ b/fs/nfs/pnfs.h
>> @@ -31,7 +31,8 @@ extern int pnfs4_proc_layoutreturn(struct nfs4_pnf=
s_layoutreturn *lrp, bool wait
>> =A0/* pnfs.c */
>> =A0extern const nfs4_stateid zero_stateid;
>>
>> -int pnfs_update_layout(struct inode *ino, struct nfs_open_context *=
ctx,
>> +void put_lseg(struct pnfs_layout_segment *lseg);
>> +int _pnfs_update_layout(struct inode *ino, struct nfs_open_context =
*ctx,
>> =A0 =A0 =A0 u64 count, loff_t pos, enum pnfs_iomode access_type,
>> =A0 =A0 =A0 struct pnfs_layout_segment **lsegpp);
>>
>> @@ -81,6 +82,12 @@ static inline int lo_fail_bit(u32 iomode)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0NFS_INO_RW_LAYOUT_FAI=
LED : NFS_INO_RO_LAYOUT_FAILED;
>> =A0}
>>
>> +static inline void get_lseg(struct pnfs_layout_segment *lseg)
>> +{
>> + =A0 =A0 if (lseg)
>
> Really? That in my experience is a shoot in the foot.
>
> I don't believe any code that decided to get an lseg could get there =
without one.
> if so I want to crash.
>
> From all instances of =A0get_lseg in this patch they already ask.
>

It is needed by one instance from the following patch.  I can change th=
is.

=46red

>> + =A0 =A0 =A0 =A0 =A0 =A0 kref_get(&lseg->kref);
>> +}
>> +
>> =A0/* Return true if a layout driver is being used for this mountpoi=
nt */
>> =A0static inline int pnfs_enabled_sb(struct nfs_server *nfss)
>> =A0{
>> @@ -170,6 +177,23 @@ static inline int pnfs_return_layout(struct ino=
de *ino,
>> =A0 =A0 =A0 return 0;
>> =A0}
>>
>> +static inline int pnfs_update_layout(struct inode *ino,
>> + =A0 =A0 struct nfs_open_context *ctx,
>> + =A0 =A0 u64 count, loff_t pos, enum pnfs_iomode access_type,
>> + =A0 =A0 struct pnfs_layout_segment **lsegpp)
>> +{
>> + =A0 =A0 struct nfs_server *nfss =3D NFS_SERVER(ino);
>> +
>> + =A0 =A0 if (pnfs_enabled_sb(nfss))
>> + =A0 =A0 =A0 =A0 =A0 =A0 return _pnfs_update_layout(ino, ctx, count=
, pos,
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 =A0access_type, lsegpp);
>> + =A0 =A0 else {
>> + =A0 =A0 =A0 =A0 =A0 =A0 if (lsegpp)
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 *lsegpp =3D NULL;
>> + =A0 =A0 =A0 =A0 =A0 =A0 return 0;
>> + =A0 =A0 }
>> +}
>> +
>> =A0static inline int pnfs_get_write_status(struct nfs_write_data *da=
ta)
>> =A0{
>> =A0 =A0 =A0 return data->pdata.pnfs_error;
>> @@ -190,6 +214,24 @@ static inline int pnfs_use_rpc(struct nfs_serve=
r *nfss)
>>
>> =A0#else =A0/* CONFIG_NFS_V4_1 */
>>
>> +static inline void get_lseg(struct pnfs_layout_segment *lseg)
>> +{
>> +}
>> +
>> +static inline void put_lseg(struct pnfs_layout_segment *lseg)
>> +{
>> +}
>> +
>> +static inline int
>> +pnfs_update_layout(struct inode *ino, struct nfs_open_context *ctx,
>> + =A0 =A0 u64 count, loff_t pos, enum pnfs_iomode access_type,
>> + =A0 =A0 struct pnfs_layout_segment **lsegpp)
>> +{
>> + =A0 =A0 if (lsegpp)
>> + =A0 =A0 =A0 =A0 =A0 =A0 *lsegpp =3D NULL;
>> + =A0 =A0 return 0;
>> +}
>> +
>> =A0static inline enum pnfs_try_status
>> =A0pnfs_try_to_read_data(struct nfs_read_data *data,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 const struct rpc_call_ops *c=
all_ops)
>
> Comments aside. Good needed stuff
> Boaz
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 13/24] pnfs_submit: read path changeover
  2010-06-09 19:19                           ` [PATCH 13/24] pnfs_submit: read path changeover Boaz Harrosh
@ 2010-06-09 19:29                             ` Fred Isaman
       [not found]                               ` <AANLkTilecdPbSOJCDkGYH-X25gcZB-1fmBmU9mEpFO_y-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-09 19:29 UTC (permalink / raw
  To: Boaz Harrosh; +Cc: Fred Isaman, linux-nfs

On Wed, Jun 9, 2010 at 3:19 PM, Boaz Harrosh <bharrosh@panasas.com> wro=
te:
> On 06/08/2010 07:19 AM, Fred Isaman wrote:
>> Change readpages path to only call LAYOUTGET once.
>>
>> Signed-off-by: Fred Isaman <iisaman@netapp.com>
>> ---
>> =A0fs/nfs/pagelist.c | =A0 =A02 ++
>> =A0fs/nfs/pnfs.c =A0 =A0 | =A0 37 +++++++---------------------------=
---
>> =A0fs/nfs/pnfs.h =A0 =A0 | =A0 25 ++++++++++++++++---------
>> =A03 files changed, 25 insertions(+), 39 deletions(-)
>>
>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>> index ed647b9..c3e5a1f 100644
>> --- a/fs/nfs/pagelist.c
>> +++ b/fs/nfs/pagelist.c
>> @@ -253,6 +253,8 @@ static int nfs_can_coalesce_requests(struct nfs_=
page *prev,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 return 0;
>> =A0 =A0 =A0 if (prev->wb_pgbase + prev->wb_bytes !=3D PAGE_CACHE_SIZ=
E)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 return 0;
>> + =A0 =A0 if (req->wb_lseg !=3D prev->wb_lseg)
>> + =A0 =A0 =A0 =A0 =A0 =A0 return 0;
>> =A0#ifdef CONFIG_NFS_V4_1
>> =A0 =A0 =A0 if (pgio->pg_test && !pgio->pg_test(pgio, prev, req))
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 return 0;
>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>> index 2b5f6fc..692a18e 100644
>> --- a/fs/nfs/pnfs.c
>> +++ b/fs/nfs/pnfs.c
>> @@ -1689,7 +1689,7 @@ pnfs_readpages(struct nfs_read_data *rdata)
>> =A0{
>> =A0 =A0 =A0 struct nfs_readargs *args =3D &rdata->args;
>> =A0 =A0 =A0 struct inode *inode =3D rdata->inode;
>> - =A0 =A0 int numpages, status, pgcount, temp;
>> + =A0 =A0 int numpages, pgcount, temp;
>> =A0 =A0 =A0 struct nfs_server *nfss =3D NFS_SERVER(inode);
>> =A0 =A0 =A0 struct nfs_inode *nfsi =3D NFS_I(inode);
>> =A0 =A0 =A0 struct pnfs_layout_segment *lseg;
>> @@ -1701,19 +1701,8 @@ pnfs_readpages(struct nfs_read_data *rdata)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 args->count,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 args->offset);
>>
>> - =A0 =A0 /* Retrieve and set layout if not allready cached */
>> - =A0 =A0 status =3D _pnfs_update_layout(inode,
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ar=
gs->context,
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ar=
gs->count,
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ar=
gs->offset,
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 IO=
MODE_READ,
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 &l=
seg);
>> - =A0 =A0 if (status) {
>> - =A0 =A0 =A0 =A0 =A0 =A0 dprintk("%s: Updating layout failed (%d), =
retry with NFS \n",
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 __func__, status);
>> - =A0 =A0 =A0 =A0 =A0 =A0 trypnfs =3D PNFS_NOT_ATTEMPTED;
>> - =A0 =A0 =A0 =A0 =A0 =A0 goto out;
>> - =A0 =A0 }
>> + =A0 =A0 lseg =3D rdata->req->wb_lseg;
>> + =A0 =A0 get_lseg(lseg);
>>
>> =A0 =A0 =A0 /* Determine number of pages. */
>> =A0 =A0 =A0 pgcount =3D args->pgbase + args->count;
>> @@ -1740,7 +1729,6 @@ pnfs_readpages(struct nfs_read_data *rdata)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 rdata->pdata.lseg =3D NULL;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 put_lseg(lseg);
>> =A0 =A0 =A0 }
>> - out:
>> =A0 =A0 =A0 dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
>> =A0 =A0 =A0 return trypnfs;
>> =A0}
>> @@ -1749,21 +1737,10 @@ enum pnfs_try_status
>> =A0_pnfs_try_to_read_data(struct nfs_read_data *data,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0const struct rpc_call_ops=
 *call_ops)
>> =A0{
>> - =A0 =A0 struct inode *ino =3D data->inode;
>> - =A0 =A0 struct nfs_server *nfss =3D NFS_SERVER(ino);
>> -
>> - =A0 =A0 dprintk("--> %s\n", __func__);
>> - =A0 =A0 /* Only create an rpc request if utilizing NFSv4 I/O */
>> - =A0 =A0 if (!pnfs_enabled_sb(nfss) ||
>> - =A0 =A0 =A0 =A0 !nfss->pnfs_curr_ld->ld_io_ops->read_pagelist) {
>> - =A0 =A0 =A0 =A0 =A0 =A0 dprintk("<-- %s: not using pnfs\n", __func=
__);
>> - =A0 =A0 =A0 =A0 =A0 =A0 return PNFS_NOT_ATTEMPTED;
>> - =A0 =A0 } else {
>> - =A0 =A0 =A0 =A0 =A0 =A0 dprintk("%s: Utilizing pNFS I/O\n", __func=
__);
>> - =A0 =A0 =A0 =A0 =A0 =A0 data->pdata.call_ops =3D call_ops;
>> - =A0 =A0 =A0 =A0 =A0 =A0 data->pdata.pnfs_error =3D 0;
>> - =A0 =A0 =A0 =A0 =A0 =A0 return pnfs_readpages(data);
>> - =A0 =A0 }
>
> Wahoo, nice stuff Ha
>
> By now this can go into it's only caller.
> (And caller de-inlined, what was that all about)
>
>> + =A0 =A0 dprintk("%s: Utilizing pNFS I/O\n", __func__);
>> + =A0 =A0 data->pdata.call_ops =3D call_ops;
>> + =A0 =A0 data->pdata.pnfs_error =3D 0;
>> + =A0 =A0 return pnfs_readpages(data);
>> =A0}
>>
>> =A0enum pnfs_try_status
>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>> index 6326ed5..816ebe1 100644
>> --- a/fs/nfs/pnfs.h
>> +++ b/fs/nfs/pnfs.h
>> @@ -94,22 +94,29 @@ static inline int pnfs_enabled_sb(struct nfs_ser=
ver *nfss)
>> =A0 =A0 =A0 return nfss->pnfs_curr_ld !=3D NULL;
>> =A0}
>>
>> +static inline void _pnfs_clear_lseg_from_pages(struct list_head *he=
ad)
>> +{
>> + =A0 =A0 struct nfs_page *req;
>> +
>> + =A0 =A0 list_for_each_entry(req, head, wb_list) {
>> + =A0 =A0 =A0 =A0 =A0 =A0 put_lseg(req->wb_lseg);
>> + =A0 =A0 =A0 =A0 =A0 =A0 req->wb_lseg =3D NULL;
>> + =A0 =A0 }
>> +}
>> +
>> =A0static inline enum pnfs_try_status
>> =A0pnfs_try_to_read_data(struct nfs_read_data *data,
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 const struct rpc_call_ops *c=
all_ops)
>
> Don't think this needs to be inline, whats the point?
>

The point is that it is in the header file, not a c file.

=46red

>> =A0{
>> - =A0 =A0 struct inode *inode =3D data->inode;
>> - =A0 =A0 struct nfs_server *nfss =3D NFS_SERVER(inode);
>> =A0 =A0 =A0 enum pnfs_try_status ret;
>>
>> - =A0 =A0 /* FIXME: read_pagelist should probably be mandated */
>> - =A0 =A0 if (PNFS_EXISTS_LDIO_OP(nfss, read_pagelist))
>> - =A0 =A0 =A0 =A0 =A0 =A0 ret =3D _pnfs_try_to_read_data(data, call_=
ops);
>> - =A0 =A0 else
>> - =A0 =A0 =A0 =A0 =A0 =A0 ret =3D PNFS_NOT_ATTEMPTED;
>> -
>> + =A0 =A0 if (!data->req->wb_lseg)
>> + =A0 =A0 =A0 =A0 =A0 =A0 return PNFS_NOT_ATTEMPTED;
>> + =A0 =A0 ret =3D _pnfs_try_to_read_data(data, call_ops);
>> =A0 =A0 =A0 if (ret =3D=3D PNFS_ATTEMPTED)
>> - =A0 =A0 =A0 =A0 =A0 =A0 nfs_inc_stats(inode, NFSIOS_PNFS_READ);
>> + =A0 =A0 =A0 =A0 =A0 =A0 nfs_inc_stats(data->inode, NFSIOS_PNFS_REA=
D);
>> + =A0 =A0 else
>> + =A0 =A0 =A0 =A0 =A0 =A0 _pnfs_clear_lseg_from_pages(&data->pages);
>> =A0 =A0 =A0 return ret;
>> =A0}
>>
>
> Boaz
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/24] pnfs_submit: use fsdata to pass lseg
  2010-06-08  4:19                           ` [PATCH 14/24] pnfs_submit: use fsdata to pass lseg Fred Isaman
  2010-06-08  4:19                             ` [PATCH 15/24] pnfs_submit: stash and refcount lseg in write path Fred Isaman
  2010-06-09 10:38                             ` [PATCH 14/24] pnfs_submit: use fsdata to pass lseg Benny Halevy
@ 2010-06-09 19:33                             ` Boaz Harrosh
  2 siblings, 0 replies; 46+ messages in thread
From: Boaz Harrosh @ 2010-06-09 19:33 UTC (permalink / raw
  To: Fred Isaman; +Cc: linux-nfs

On 06/08/2010 07:19 AM, Fred Isaman wrote:
> Preparing for LAYUTGET invocation in nfs_write_begin to be the
> only invocation in the write path.
> 
> It isn't used at all yet, but it should be properly referenced/dereferenced
> 
> Signed-off-by: Fred Isaman <iisaman@netapp.com>
> ---
>  fs/nfs/file.c |   16 +++++++++++++---
>  1 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index 03601d2..fde6cb5 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -420,6 +420,8 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
>  		file->f_path.dentry->d_name.name,
>  		mapping->host->i_ino, len, (long long) pos);
>  
> +	pnfs_update_layout(mapping->host, NULL, NFS4_MAX_UINT64, 0, IOMODE_RW,
> +			   (struct pnfs_layout_segment **) fsdata);

All files write layout?

Please lets separate this in two parts.
A) Something we can all test and should be clean and better then today.
B) B to A revert patch.

I'd say ask for the pages index. Server can set policy, just fine.

What if I write to an hole at 4Gb you still want offset zero?

(BTW the order of params to pnfs_update_layout is all wrong)

Boaz

>  start:
>  	/*
>  	 * Prevent starvation issues if someone is doing a consistency
> @@ -428,11 +430,13 @@ start:
>  	ret = wait_on_bit(&NFS_I(mapping->host)->flags, NFS_INO_FLUSHING,
>  			nfs_wait_bit_killable, TASK_KILLABLE);
>  	if (ret)
> -		return ret;
> +		goto out;
>  
>  	page = grab_cache_page_write_begin(mapping, index, flags);
> -	if (!page)
> -		return -ENOMEM;
> +	if (!page) {
> +		ret = -ENOMEM;
> +		goto out;
> +	}
>  	*pagep = page;
>  
>  	ret = nfs_flush_incompatible(file, page);
> @@ -447,6 +451,11 @@ start:
>  		if (!ret)
>  			goto start;
>  	}
> + out:
> +	if (ret) {
> +		put_lseg(*fsdata);
> +		*fsdata = NULL;
> +	}
>  	return ret;
>  }
>  
> @@ -486,6 +495,7 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,
>  
>  	unlock_page(page);
>  	page_cache_release(page);
> +	put_lseg(fsdata);
>  
>  	if (status < 0)
>  		return status;


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 13/24] pnfs_submit: read path changeover
       [not found]                               ` <AANLkTilecdPbSOJCDkGYH-X25gcZB-1fmBmU9mEpFO_y-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-06-09 19:39                                 ` Boaz Harrosh
  2010-06-09 19:46                                   ` Fred Isaman
  0 siblings, 1 reply; 46+ messages in thread
From: Boaz Harrosh @ 2010-06-09 19:39 UTC (permalink / raw
  To: Fred Isaman; +Cc: Fred Isaman, linux-nfs

On 06/09/2010 10:29 PM, Fred Isaman wrote:
> On Wed, Jun 9, 2010 at 3:19 PM, Boaz Harrosh <bharrosh@panasas.com> wrote:
>> On 06/08/2010 07:19 AM, Fred Isaman wrote:
>>> Change readpages path to only call LAYOUTGET once.
>>>
>>> Signed-off-by: Fred Isaman <iisaman@netapp.com>
>>> ---
>>>  fs/nfs/pagelist.c |    2 ++
>>>  fs/nfs/pnfs.c     |   37 +++++++------------------------------
>>>  fs/nfs/pnfs.h     |   25 ++++++++++++++++---------
>>>  3 files changed, 25 insertions(+), 39 deletions(-)
>>>
>>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>>> index ed647b9..c3e5a1f 100644
>>> --- a/fs/nfs/pagelist.c
>>> +++ b/fs/nfs/pagelist.c
>>> @@ -253,6 +253,8 @@ static int nfs_can_coalesce_requests(struct nfs_page *prev,
>>>               return 0;
>>>       if (prev->wb_pgbase + prev->wb_bytes != PAGE_CACHE_SIZE)
>>>               return 0;
>>> +     if (req->wb_lseg != prev->wb_lseg)
>>> +             return 0;
>>>  #ifdef CONFIG_NFS_V4_1
>>>       if (pgio->pg_test && !pgio->pg_test(pgio, prev, req))
>>>               return 0;
>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>> index 2b5f6fc..692a18e 100644
>>> --- a/fs/nfs/pnfs.c
>>> +++ b/fs/nfs/pnfs.c
>>> @@ -1689,7 +1689,7 @@ pnfs_readpages(struct nfs_read_data *rdata)
>>>  {
>>>       struct nfs_readargs *args = &rdata->args;
>>>       struct inode *inode = rdata->inode;
>>> -     int numpages, status, pgcount, temp;
>>> +     int numpages, pgcount, temp;
>>>       struct nfs_server *nfss = NFS_SERVER(inode);
>>>       struct nfs_inode *nfsi = NFS_I(inode);
>>>       struct pnfs_layout_segment *lseg;
>>> @@ -1701,19 +1701,8 @@ pnfs_readpages(struct nfs_read_data *rdata)
>>>               args->count,
>>>               args->offset);
>>>
>>> -     /* Retrieve and set layout if not allready cached */
>>> -     status = _pnfs_update_layout(inode,
>>> -                                 args->context,
>>> -                                 args->count,
>>> -                                 args->offset,
>>> -                                 IOMODE_READ,
>>> -                                 &lseg);
>>> -     if (status) {
>>> -             dprintk("%s: Updating layout failed (%d), retry with NFS \n",
>>> -                     __func__, status);
>>> -             trypnfs = PNFS_NOT_ATTEMPTED;
>>> -             goto out;
>>> -     }
>>> +     lseg = rdata->req->wb_lseg;
>>> +     get_lseg(lseg);
>>>
>>>       /* Determine number of pages. */
>>>       pgcount = args->pgbase + args->count;
>>> @@ -1740,7 +1729,6 @@ pnfs_readpages(struct nfs_read_data *rdata)
>>>               rdata->pdata.lseg = NULL;
>>>               put_lseg(lseg);
>>>       }
>>> - out:
>>>       dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
>>>       return trypnfs;
>>>  }
>>> @@ -1749,21 +1737,10 @@ enum pnfs_try_status
>>>  _pnfs_try_to_read_data(struct nfs_read_data *data,
>>>                      const struct rpc_call_ops *call_ops)
>>>  {
>>> -     struct inode *ino = data->inode;
>>> -     struct nfs_server *nfss = NFS_SERVER(ino);
>>> -
>>> -     dprintk("--> %s\n", __func__);
>>> -     /* Only create an rpc request if utilizing NFSv4 I/O */
>>> -     if (!pnfs_enabled_sb(nfss) ||
>>> -         !nfss->pnfs_curr_ld->ld_io_ops->read_pagelist) {
>>> -             dprintk("<-- %s: not using pnfs\n", __func__);
>>> -             return PNFS_NOT_ATTEMPTED;
>>> -     } else {
>>> -             dprintk("%s: Utilizing pNFS I/O\n", __func__);
>>> -             data->pdata.call_ops = call_ops;
>>> -             data->pdata.pnfs_error = 0;
>>> -             return pnfs_readpages(data);
>>> -     }
>>
>> Wahoo, nice stuff Ha
>>
>> By now this can go into it's only caller.
>> (And caller de-inlined, what was that all about)
>>
>>> +     dprintk("%s: Utilizing pNFS I/O\n", __func__);
>>> +     data->pdata.call_ops = call_ops;
>>> +     data->pdata.pnfs_error = 0;
>>> +     return pnfs_readpages(data);
>>>  }
>>>
>>>  enum pnfs_try_status
>>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>>> index 6326ed5..816ebe1 100644
>>> --- a/fs/nfs/pnfs.h
>>> +++ b/fs/nfs/pnfs.h
>>> @@ -94,22 +94,29 @@ static inline int pnfs_enabled_sb(struct nfs_server *nfss)
>>>       return nfss->pnfs_curr_ld != NULL;
>>>  }
>>>
>>> +static inline void _pnfs_clear_lseg_from_pages(struct list_head *head)
>>> +{
>>> +     struct nfs_page *req;
>>> +
>>> +     list_for_each_entry(req, head, wb_list) {
>>> +             put_lseg(req->wb_lseg);
>>> +             req->wb_lseg = NULL;
>>> +     }
>>> +}
>>> +
>>>  static inline enum pnfs_try_status
>>>  pnfs_try_to_read_data(struct nfs_read_data *data,
>>>                     const struct rpc_call_ops *call_ops)
>>
>> Don't think this needs to be inline, whats the point?
>>
> 
> The point is that it is in the header file, not a c file.
> 

That's what I meant. Why is it in the header file. Why not
in .c file and declared.

> Fred
> 

(-Bz

>>>  {
>>> -     struct inode *inode = data->inode;
>>> -     struct nfs_server *nfss = NFS_SERVER(inode);
>>>       enum pnfs_try_status ret;
>>>
>>> -     /* FIXME: read_pagelist should probably be mandated */
>>> -     if (PNFS_EXISTS_LDIO_OP(nfss, read_pagelist))
>>> -             ret = _pnfs_try_to_read_data(data, call_ops);
>>> -     else
>>> -             ret = PNFS_NOT_ATTEMPTED;
>>> -
>>> +     if (!data->req->wb_lseg)
>>> +             return PNFS_NOT_ATTEMPTED;
>>> +     ret = _pnfs_try_to_read_data(data, call_ops);
>>>       if (ret == PNFS_ATTEMPTED)
>>> -             nfs_inc_stats(inode, NFSIOS_PNFS_READ);
>>> +             nfs_inc_stats(data->inode, NFSIOS_PNFS_READ);
>>> +     else
>>> +             _pnfs_clear_lseg_from_pages(&data->pages);
>>>       return ret;
>>>  }
>>>
>>
>> Boaz
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 13/24] pnfs_submit: read path changeover
  2010-06-09 19:39                                 ` Boaz Harrosh
@ 2010-06-09 19:46                                   ` Fred Isaman
  2010-06-10  6:26                                     ` Boaz Harrosh
  0 siblings, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-09 19:46 UTC (permalink / raw
  To: Boaz Harrosh; +Cc: Fred Isaman, linux-nfs

On Wed, Jun 9, 2010 at 3:39 PM, Boaz Harrosh <bharrosh@panasas.com> wro=
te:
> On 06/09/2010 10:29 PM, Fred Isaman wrote:
>> On Wed, Jun 9, 2010 at 3:19 PM, Boaz Harrosh <bharrosh@panasas.com> =
wrote:
>>> On 06/08/2010 07:19 AM, Fred Isaman wrote:
>>>> Change readpages path to only call LAYOUTGET once.
>>>>
>>>> Signed-off-by: Fred Isaman <iisaman@netapp.com>
>>>> ---
>>>> =A0fs/nfs/pagelist.c | =A0 =A02 ++
>>>> =A0fs/nfs/pnfs.c =A0 =A0 | =A0 37 +++++++-------------------------=
-----
>>>> =A0fs/nfs/pnfs.h =A0 =A0 | =A0 25 ++++++++++++++++---------
>>>> =A03 files changed, 25 insertions(+), 39 deletions(-)
>>>>
>>>> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
>>>> index ed647b9..c3e5a1f 100644
>>>> --- a/fs/nfs/pagelist.c
>>>> +++ b/fs/nfs/pagelist.c
>>>> @@ -253,6 +253,8 @@ static int nfs_can_coalesce_requests(struct nf=
s_page *prev,
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 return 0;
>>>> =A0 =A0 =A0 if (prev->wb_pgbase + prev->wb_bytes !=3D PAGE_CACHE_S=
IZE)
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 return 0;
>>>> + =A0 =A0 if (req->wb_lseg !=3D prev->wb_lseg)
>>>> + =A0 =A0 =A0 =A0 =A0 =A0 return 0;
>>>> =A0#ifdef CONFIG_NFS_V4_1
>>>> =A0 =A0 =A0 if (pgio->pg_test && !pgio->pg_test(pgio, prev, req))
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 return 0;
>>>> diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
>>>> index 2b5f6fc..692a18e 100644
>>>> --- a/fs/nfs/pnfs.c
>>>> +++ b/fs/nfs/pnfs.c
>>>> @@ -1689,7 +1689,7 @@ pnfs_readpages(struct nfs_read_data *rdata)
>>>> =A0{
>>>> =A0 =A0 =A0 struct nfs_readargs *args =3D &rdata->args;
>>>> =A0 =A0 =A0 struct inode *inode =3D rdata->inode;
>>>> - =A0 =A0 int numpages, status, pgcount, temp;
>>>> + =A0 =A0 int numpages, pgcount, temp;
>>>> =A0 =A0 =A0 struct nfs_server *nfss =3D NFS_SERVER(inode);
>>>> =A0 =A0 =A0 struct nfs_inode *nfsi =3D NFS_I(inode);
>>>> =A0 =A0 =A0 struct pnfs_layout_segment *lseg;
>>>> @@ -1701,19 +1701,8 @@ pnfs_readpages(struct nfs_read_data *rdata)
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 args->count,
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 args->offset);
>>>>
>>>> - =A0 =A0 /* Retrieve and set layout if not allready cached */
>>>> - =A0 =A0 status =3D _pnfs_update_layout(inode,
>>>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
args->context,
>>>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
args->count,
>>>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
args->offset,
>>>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
IOMODE_READ,
>>>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
&lseg);
>>>> - =A0 =A0 if (status) {
>>>> - =A0 =A0 =A0 =A0 =A0 =A0 dprintk("%s: Updating layout failed (%d)=
, retry with NFS \n",
>>>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 __func__, status);
>>>> - =A0 =A0 =A0 =A0 =A0 =A0 trypnfs =3D PNFS_NOT_ATTEMPTED;
>>>> - =A0 =A0 =A0 =A0 =A0 =A0 goto out;
>>>> - =A0 =A0 }
>>>> + =A0 =A0 lseg =3D rdata->req->wb_lseg;
>>>> + =A0 =A0 get_lseg(lseg);
>>>>
>>>> =A0 =A0 =A0 /* Determine number of pages. */
>>>> =A0 =A0 =A0 pgcount =3D args->pgbase + args->count;
>>>> @@ -1740,7 +1729,6 @@ pnfs_readpages(struct nfs_read_data *rdata)
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 rdata->pdata.lseg =3D NULL;
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 put_lseg(lseg);
>>>> =A0 =A0 =A0 }
>>>> - out:
>>>> =A0 =A0 =A0 dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
>>>> =A0 =A0 =A0 return trypnfs;
>>>> =A0}
>>>> @@ -1749,21 +1737,10 @@ enum pnfs_try_status
>>>> =A0_pnfs_try_to_read_data(struct nfs_read_data *data,
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0const struct rpc_call_o=
ps *call_ops)
>>>> =A0{
>>>> - =A0 =A0 struct inode *ino =3D data->inode;
>>>> - =A0 =A0 struct nfs_server *nfss =3D NFS_SERVER(ino);
>>>> -
>>>> - =A0 =A0 dprintk("--> %s\n", __func__);
>>>> - =A0 =A0 /* Only create an rpc request if utilizing NFSv4 I/O */
>>>> - =A0 =A0 if (!pnfs_enabled_sb(nfss) ||
>>>> - =A0 =A0 =A0 =A0 !nfss->pnfs_curr_ld->ld_io_ops->read_pagelist) {
>>>> - =A0 =A0 =A0 =A0 =A0 =A0 dprintk("<-- %s: not using pnfs\n", __fu=
nc__);
>>>> - =A0 =A0 =A0 =A0 =A0 =A0 return PNFS_NOT_ATTEMPTED;
>>>> - =A0 =A0 } else {
>>>> - =A0 =A0 =A0 =A0 =A0 =A0 dprintk("%s: Utilizing pNFS I/O\n", __fu=
nc__);
>>>> - =A0 =A0 =A0 =A0 =A0 =A0 data->pdata.call_ops =3D call_ops;
>>>> - =A0 =A0 =A0 =A0 =A0 =A0 data->pdata.pnfs_error =3D 0;
>>>> - =A0 =A0 =A0 =A0 =A0 =A0 return pnfs_readpages(data);
>>>> - =A0 =A0 }
>>>
>>> Wahoo, nice stuff Ha
>>>
>>> By now this can go into it's only caller.
>>> (And caller de-inlined, what was that all about)
>>>
>>>> + =A0 =A0 dprintk("%s: Utilizing pNFS I/O\n", __func__);
>>>> + =A0 =A0 data->pdata.call_ops =3D call_ops;
>>>> + =A0 =A0 data->pdata.pnfs_error =3D 0;
>>>> + =A0 =A0 return pnfs_readpages(data);
>>>> =A0}
>>>>
>>>> =A0enum pnfs_try_status
>>>> diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
>>>> index 6326ed5..816ebe1 100644
>>>> --- a/fs/nfs/pnfs.h
>>>> +++ b/fs/nfs/pnfs.h
>>>> @@ -94,22 +94,29 @@ static inline int pnfs_enabled_sb(struct nfs_s=
erver *nfss)
>>>> =A0 =A0 =A0 return nfss->pnfs_curr_ld !=3D NULL;
>>>> =A0}
>>>>
>>>> +static inline void _pnfs_clear_lseg_from_pages(struct list_head *=
head)
>>>> +{
>>>> + =A0 =A0 struct nfs_page *req;
>>>> +
>>>> + =A0 =A0 list_for_each_entry(req, head, wb_list) {
>>>> + =A0 =A0 =A0 =A0 =A0 =A0 put_lseg(req->wb_lseg);
>>>> + =A0 =A0 =A0 =A0 =A0 =A0 req->wb_lseg =3D NULL;
>>>> + =A0 =A0 }
>>>> +}
>>>> +
>>>> =A0static inline enum pnfs_try_status
>>>> =A0pnfs_try_to_read_data(struct nfs_read_data *data,
>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 const struct rpc_call_ops =
*call_ops)
>>>
>>> Don't think this needs to be inline, whats the point?
>>>
>>
>> The point is that it is in the header file, not a c file.
>>
>
> That's what I meant. Why is it in the header file. Why not
> in .c file and declared.
>

To make it easy to ifdef out if CONFIG_NFS_V4_1 is not set.

=46red

>> Fred
>>
>
> (-Bz
>
>>>> =A0{
>>>> - =A0 =A0 struct inode *inode =3D data->inode;
>>>> - =A0 =A0 struct nfs_server *nfss =3D NFS_SERVER(inode);
>>>> =A0 =A0 =A0 enum pnfs_try_status ret;
>>>>
>>>> - =A0 =A0 /* FIXME: read_pagelist should probably be mandated */
>>>> - =A0 =A0 if (PNFS_EXISTS_LDIO_OP(nfss, read_pagelist))
>>>> - =A0 =A0 =A0 =A0 =A0 =A0 ret =3D _pnfs_try_to_read_data(data, cal=
l_ops);
>>>> - =A0 =A0 else
>>>> - =A0 =A0 =A0 =A0 =A0 =A0 ret =3D PNFS_NOT_ATTEMPTED;
>>>> -
>>>> + =A0 =A0 if (!data->req->wb_lseg)
>>>> + =A0 =A0 =A0 =A0 =A0 =A0 return PNFS_NOT_ATTEMPTED;
>>>> + =A0 =A0 ret =3D _pnfs_try_to_read_data(data, call_ops);
>>>> =A0 =A0 =A0 if (ret =3D=3D PNFS_ATTEMPTED)
>>>> - =A0 =A0 =A0 =A0 =A0 =A0 nfs_inc_stats(inode, NFSIOS_PNFS_READ);
>>>> + =A0 =A0 =A0 =A0 =A0 =A0 nfs_inc_stats(data->inode, NFSIOS_PNFS_R=
EAD);
>>>> + =A0 =A0 else
>>>> + =A0 =A0 =A0 =A0 =A0 =A0 _pnfs_clear_lseg_from_pages(&data->pages=
);
>>>> =A0 =A0 =A0 return ret;
>>>> =A0}
>>>>
>>>
>>> Boaz
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs=
" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.htm=
l
>>>
>
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 13/24] pnfs_submit: read path changeover
  2010-06-09 19:46                                   ` Fred Isaman
@ 2010-06-10  6:26                                     ` Boaz Harrosh
  0 siblings, 0 replies; 46+ messages in thread
From: Boaz Harrosh @ 2010-06-10  6:26 UTC (permalink / raw
  To: Fred Isaman; +Cc: Fred Isaman, linux-nfs

On 06/09/2010 10:46 PM, Fred Isaman wrote:
>>>>>  static inline enum pnfs_try_status
>>>>>  pnfs_try_to_read_data(struct nfs_read_data *data,
>>>>>                     const struct rpc_call_ops *call_ops)
>>>>
>>>> Don't think this needs to be inline, whats the point?
>>>>
>>>
>>> The point is that it is in the header file, not a c file.
>>>
>>
>> That's what I meant. Why is it in the header file. Why not
>> in .c file and declared.
>>
> 
> To make it easy to ifdef out if CONFIG_NFS_V4_1 is not set.
> 
> Fred
> 

It's not easier, just:

#ifdef CONFIG_NFS_V4_1
enum pnfs_try_status
pnfs_try_to_read_data(struct nfs_read_data *data,
                     const struct rpc_call_ops *call_ops); /* declaration */
#else
static inline enum pnfs_try_status
pnfs_try_to_read_data(struct nfs_read_data *data,
                     const struct rpc_call_ops *call_ops) /* Empty inline */
{
	return ...;
}
#endif

We do that everywhere. Should also here
Boaz

>>> Fred
>>>
>>
>> (-Bz
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/24] pnfs_submit: use fsdata to pass lseg
  2010-06-09 12:08                               ` Fred Isaman
@ 2010-06-10 10:33                                 ` Fred Isaman
  2010-06-10 12:45                                   ` Benny Halevy
  0 siblings, 1 reply; 46+ messages in thread
From: Fred Isaman @ 2010-06-10 10:33 UTC (permalink / raw
  To: Benny Halevy; +Cc: linux-nfs


On Jun 9, 2010, at 8:08 AM, Fred Isaman wrote:

> On Wed, Jun 9, 2010 at 6:38 AM, Benny Halevy <bhalevy@panasas.com> wrote:
>> Fred, how does that patch interact with
>> 285052f pnfs_post_submit: Restore "pnfs: pnfs_do_flush"
>> and the latter patches that depend on it?
>> 
>> Benny
>> 
> 
> They will have to be modified.  I'll look at that today.
> 
> Fred

OK, this is a general git question.  How in the world do I send in these modifications?

Basically, because of the way we have pnfs-submit in the middle of our tree, I have a branch that looks like:

A->B->C->D

I've inserted my new patch F between And B, which requires a rebase of the subsequent patches:

A->F->B'->C'->D'

But that rebase is non-trivial, in particular for patch C (a block-layout patch), and I want to communicate the modifications I made.

The best I have been able to come up with is to do the minimal obvious rebase, just sufficient to remove all the conflict markers,
then add a following modification patch, so I would have something like:

A->F->B'->C'->C''->D'

and I could send in C''.  But this seems less than ideal, especially when you consider I have ~10 patches which would require this handling.

Fred


> 
>> On Jun. 08, 2010, 7:19 +0300, Fred Isaman <iisaman@netapp.com> wrote:
>>> Preparing for LAYUTGET invocation in nfs_write_begin to be the
>>> only invocation in the write path.
>>> 
>>> It isn't used at all yet, but it should be properly referenced/dereferenced
>>> 
>>> Signed-off-by: Fred Isaman <iisaman@netapp.com>
>>> ---
>>>  fs/nfs/file.c |   16 +++++++++++++---
>>>  1 files changed, 13 insertions(+), 3 deletions(-)
>>> 
>>> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
>>> index 03601d2..fde6cb5 100644
>>> --- a/fs/nfs/file.c
>>> +++ b/fs/nfs/file.c
>>> @@ -420,6 +420,8 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
>>>               file->f_path.dentry->d_name.name,
>>>               mapping->host->i_ino, len, (long long) pos);
>>> 
>>> +     pnfs_update_layout(mapping->host, NULL, NFS4_MAX_UINT64, 0, IOMODE_RW,
>>> +                        (struct pnfs_layout_segment **) fsdata);
>>>  start:
>>>       /*
>>>        * Prevent starvation issues if someone is doing a consistency
>>> @@ -428,11 +430,13 @@ start:
>>>       ret = wait_on_bit(&NFS_I(mapping->host)->flags, NFS_INO_FLUSHING,
>>>                       nfs_wait_bit_killable, TASK_KILLABLE);
>>>       if (ret)
>>> -             return ret;
>>> +             goto out;
>>> 
>>>       page = grab_cache_page_write_begin(mapping, index, flags);
>>> -     if (!page)
>>> -             return -ENOMEM;
>>> +     if (!page) {
>>> +             ret = -ENOMEM;
>>> +             goto out;
>>> +     }
>>>       *pagep = page;
>>> 
>>>       ret = nfs_flush_incompatible(file, page);
>>> @@ -447,6 +451,11 @@ start:
>>>               if (!ret)
>>>                       goto start;
>>>       }
>>> + out:
>>> +     if (ret) {
>>> +             put_lseg(*fsdata);
>>> +             *fsdata = NULL;
>>> +     }
>>>       return ret;
>>>  }
>>> 
>>> @@ -486,6 +495,7 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,
>>> 
>>>       unlock_page(page);
>>>       page_cache_release(page);
>>> +     put_lseg(fsdata);
>>> 
>>>       if (status < 0)
>>>               return status;
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/24] pnfs_submit: use fsdata to pass lseg
  2010-06-10 10:33                                 ` Fred Isaman
@ 2010-06-10 12:45                                   ` Benny Halevy
  2010-06-10 12:48                                     ` Benny Halevy
  0 siblings, 1 reply; 46+ messages in thread
From: Benny Halevy @ 2010-06-10 12:45 UTC (permalink / raw
  To: Fred Isaman; +Cc: linux-nfs

On 06/10/2010 01:33 PM, Fred Isaman wrote:
> 
> On Jun 9, 2010, at 8:08 AM, Fred Isaman wrote:
> 
>> On Wed, Jun 9, 2010 at 6:38 AM, Benny Halevy <bhalevy@panasas.com> wrote:
>>> Fred, how does that patch interact with
>>> 285052f pnfs_post_submit: Restore "pnfs: pnfs_do_flush"
>>> and the latter patches that depend on it?
>>>
>>> Benny
>>>
>>
>> They will have to be modified.  I'll look at that today.
>>
>> Fred
> 
> OK, this is a general git question.  How in the world do I send in these modifications?
> 
> Basically, because of the way we have pnfs-submit in the middle of our tree, I have a branch that looks like:
> 
> A->B->C->D
> 
> I've inserted my new patch F between And B, which requires a rebase of the subsequent patches:
> 
> A->F->B'->C'->D'
> 
> But that rebase is non-trivial, in particular for patch C (a block-layout patch), and I want to communicate the modifications I made.
> 
> The best I have been able to come up with is to do the minimal obvious rebase, just sufficient to remove all the conflict markers,
> then add a following modification patch, so I would have something like:
> 
> A->F->B'->C'->C''->D'
> 
> and I could send in C''.  But this seems less than ideal, especially when you consider I have ~10 patches which would require this handling.
> 

You can either send your a patchset based on C and I can
rebase parts of it onto A and B or just send the clean rebased
patches from B and C and we can review the diff (B vs. B' and
C vs. C')

Benny

> Fred
> 
> 
>>
>>> On Jun. 08, 2010, 7:19 +0300, Fred Isaman <iisaman@netapp.com> wrote:
>>>> Preparing for LAYUTGET invocation in nfs_write_begin to be the
>>>> only invocation in the write path.
>>>>
>>>> It isn't used at all yet, but it should be properly referenced/dereferenced
>>>>
>>>> Signed-off-by: Fred Isaman <iisaman@netapp.com>
>>>> ---
>>>>  fs/nfs/file.c |   16 +++++++++++++---
>>>>  1 files changed, 13 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
>>>> index 03601d2..fde6cb5 100644
>>>> --- a/fs/nfs/file.c
>>>> +++ b/fs/nfs/file.c
>>>> @@ -420,6 +420,8 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
>>>>               file->f_path.dentry->d_name.name,
>>>>               mapping->host->i_ino, len, (long long) pos);
>>>>
>>>> +     pnfs_update_layout(mapping->host, NULL, NFS4_MAX_UINT64, 0, IOMODE_RW,
>>>> +                        (struct pnfs_layout_segment **) fsdata);
>>>>  start:
>>>>       /*
>>>>        * Prevent starvation issues if someone is doing a consistency
>>>> @@ -428,11 +430,13 @@ start:
>>>>       ret = wait_on_bit(&NFS_I(mapping->host)->flags, NFS_INO_FLUSHING,
>>>>                       nfs_wait_bit_killable, TASK_KILLABLE);
>>>>       if (ret)
>>>> -             return ret;
>>>> +             goto out;
>>>>
>>>>       page = grab_cache_page_write_begin(mapping, index, flags);
>>>> -     if (!page)
>>>> -             return -ENOMEM;
>>>> +     if (!page) {
>>>> +             ret = -ENOMEM;
>>>> +             goto out;
>>>> +     }
>>>>       *pagep = page;
>>>>
>>>>       ret = nfs_flush_incompatible(file, page);
>>>> @@ -447,6 +451,11 @@ start:
>>>>               if (!ret)
>>>>                       goto start;
>>>>       }
>>>> + out:
>>>> +     if (ret) {
>>>> +             put_lseg(*fsdata);
>>>> +             *fsdata = NULL;
>>>> +     }
>>>>       return ret;
>>>>  }
>>>>
>>>> @@ -486,6 +495,7 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,
>>>>
>>>>       unlock_page(page);
>>>>       page_cache_release(page);
>>>> +     put_lseg(fsdata);
>>>>
>>>>       if (status < 0)
>>>>               return status;
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/24] pnfs_submit: use fsdata to pass lseg
  2010-06-10 12:45                                   ` Benny Halevy
@ 2010-06-10 12:48                                     ` Benny Halevy
  2010-06-10 13:09                                       ` Boaz Harrosh
  0 siblings, 1 reply; 46+ messages in thread
From: Benny Halevy @ 2010-06-10 12:48 UTC (permalink / raw
  To: Fred Isaman; +Cc: linux-nfs

On Jun. 10, 2010, 15:45 +0300, Benny Halevy <bhalevy@panasas.com> wrote:
> On 06/10/2010 01:33 PM, Fred Isaman wrote:
>>
>> On Jun 9, 2010, at 8:08 AM, Fred Isaman wrote:
>>
>>> On Wed, Jun 9, 2010 at 6:38 AM, Benny Halevy <bhalevy@panasas.com> wrote:
>>>> Fred, how does that patch interact with
>>>> 285052f pnfs_post_submit: Restore "pnfs: pnfs_do_flush"
>>>> and the latter patches that depend on it?
>>>>
>>>> Benny
>>>>
>>>
>>> They will have to be modified.  I'll look at that today.
>>>
>>> Fred
>>
>> OK, this is a general git question.  How in the world do I send in these modifications?
>>
>> Basically, because of the way we have pnfs-submit in the middle of our tree, I have a branch that looks like:
>>
>> A->B->C->D
>>
>> I've inserted my new patch F between And B, which requires a rebase of the subsequent patches:
>>
>> A->F->B'->C'->D'
>>
>> But that rebase is non-trivial, in particular for patch C (a block-layout patch), and I want to communicate the modifications I made.
>>
>> The best I have been able to come up with is to do the minimal obvious rebase, just sufficient to remove all the conflict markers,
>> then add a following modification patch, so I would have something like:
>>
>> A->F->B'->C'->C''->D'
>>
>> and I could send in C''.  But this seems less than ideal, especially when you consider I have ~10 patches which would require this handling.
>>
> 
> You can either send your a patchset based on C and I can
> rebase parts of it onto A and B or just send the clean rebased
> patches from B and C and we can review the diff (B vs. B' and
> C vs. C')

Oh and I apologize for having set a moving target for you.
I'm about to release a tree with merged patches from Alexandros,
Andy, and Boaz (merge conflicts should be minimal though)

> 
> Benny
> 
>> Fred
>>
>>
>>>
>>>> On Jun. 08, 2010, 7:19 +0300, Fred Isaman <iisaman@netapp.com> wrote:
>>>>> Preparing for LAYUTGET invocation in nfs_write_begin to be the
>>>>> only invocation in the write path.
>>>>>
>>>>> It isn't used at all yet, but it should be properly referenced/dereferenced
>>>>>
>>>>> Signed-off-by: Fred Isaman <iisaman@netapp.com>
>>>>> ---
>>>>>  fs/nfs/file.c |   16 +++++++++++++---
>>>>>  1 files changed, 13 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
>>>>> index 03601d2..fde6cb5 100644
>>>>> --- a/fs/nfs/file.c
>>>>> +++ b/fs/nfs/file.c
>>>>> @@ -420,6 +420,8 @@ static int nfs_write_begin(struct file *file, struct address_space *mapping,
>>>>>               file->f_path.dentry->d_name.name,
>>>>>               mapping->host->i_ino, len, (long long) pos);
>>>>>
>>>>> +     pnfs_update_layout(mapping->host, NULL, NFS4_MAX_UINT64, 0, IOMODE_RW,
>>>>> +                        (struct pnfs_layout_segment **) fsdata);
>>>>>  start:
>>>>>       /*
>>>>>        * Prevent starvation issues if someone is doing a consistency
>>>>> @@ -428,11 +430,13 @@ start:
>>>>>       ret = wait_on_bit(&NFS_I(mapping->host)->flags, NFS_INO_FLUSHING,
>>>>>                       nfs_wait_bit_killable, TASK_KILLABLE);
>>>>>       if (ret)
>>>>> -             return ret;
>>>>> +             goto out;
>>>>>
>>>>>       page = grab_cache_page_write_begin(mapping, index, flags);
>>>>> -     if (!page)
>>>>> -             return -ENOMEM;
>>>>> +     if (!page) {
>>>>> +             ret = -ENOMEM;
>>>>> +             goto out;
>>>>> +     }
>>>>>       *pagep = page;
>>>>>
>>>>>       ret = nfs_flush_incompatible(file, page);
>>>>> @@ -447,6 +451,11 @@ start:
>>>>>               if (!ret)
>>>>>                       goto start;
>>>>>       }
>>>>> + out:
>>>>> +     if (ret) {
>>>>> +             put_lseg(*fsdata);
>>>>> +             *fsdata = NULL;
>>>>> +     }
>>>>>       return ret;
>>>>>  }
>>>>>
>>>>> @@ -486,6 +495,7 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,
>>>>>
>>>>>       unlock_page(page);
>>>>>       page_cache_release(page);
>>>>> +     put_lseg(fsdata);
>>>>>
>>>>>       if (status < 0)
>>>>>               return status;
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 14/24] pnfs_submit: use fsdata to pass lseg
  2010-06-10 12:48                                     ` Benny Halevy
@ 2010-06-10 13:09                                       ` Boaz Harrosh
  0 siblings, 0 replies; 46+ messages in thread
From: Boaz Harrosh @ 2010-06-10 13:09 UTC (permalink / raw
  To: Benny Halevy; +Cc: Fred Isaman, linux-nfs

On 06/10/2010 03:48 PM, Benny Halevy wrote:
> On Jun. 10, 2010, 15:45 +0300, Benny Halevy <bhalevy@panasas.com> wrote:
>> On 06/10/2010 01:33 PM, Fred Isaman wrote:
>>>
>>> On Jun 9, 2010, at 8:08 AM, Fred Isaman wrote:
>>>
>>>> On Wed, Jun 9, 2010 at 6:38 AM, Benny Halevy <bhalevy@panasas.com> wrote:
>>>>> Fred, how does that patch interact with
>>>>> 285052f pnfs_post_submit: Restore "pnfs: pnfs_do_flush"
>>>>> and the latter patches that depend on it?
>>>>>
>>>>> Benny
>>>>>
>>>>
>>>> They will have to be modified.  I'll look at that today.
>>>>
>>>> Fred
>>>
>>> OK, this is a general git question.  How in the world do I send in these modifications?
>>>
>>> Basically, because of the way we have pnfs-submit in the middle of our tree, I have a branch that looks like:
>>>
>>> A->B->C->D
>>>
>>> I've inserted my new patch F between And B, which requires a rebase of the subsequent patches:
>>>
>>> A->F->B'->C'->D'
>>>
>>> But that rebase is non-trivial, in particular for patch C (a block-layout patch), and I want to communicate the modifications I made.
>>>
>>> The best I have been able to come up with is to do the minimal obvious rebase, just sufficient to remove all the conflict markers,
>>> then add a following modification patch, so I would have something like:
>>>
>>> A->F->B'->C'->C''->D'
>>>
>>> and I could send in C''.  But this seems less than ideal, especially when you consider I have ~10 patches which would require this handling.
>>>
>>
>> You can either send your a patchset based on C and I can
>> rebase parts of it onto A and B or just send the clean rebased
>> patches from B and C and we can review the diff (B vs. B' and
>> C vs. C')
> 
> Oh and I apologize for having set a moving target for you.
> I'm about to release a tree with merged patches from Alexandros,
> Andy, and Boaz (merge conflicts should be minimal though)
> 
>>
>> Benny
>>

Fred do you have a git tree on the linux-nfs.org/ or at Netapp that's open on the net?
If so you could just publish your cleanup tree, Benny will "git remote add" your branch
and can then cherry-pick / rebase very easily. It's how we did it a few times.
(We still need patches for review though)

Boaz

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2010-06-10 13:09 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-08  4:18 [PATCH 00/24] LAYOUTGET invocation (rebased) Fred Isaman
2010-06-08  4:18 ` [PATCH 01/24] Revert "pnfs-nonfilelayout: Prelim support for non-file layout O_DIRECT" Fred Isaman
2010-06-08  4:18   ` [PATCH 02/24] Revert "pnfs: Enable O_DIRECT write path." Fred Isaman
2010-06-08  4:19     ` [PATCH 03/24] Revert "pnfs: Enable O_DIRECT read path." Fred Isaman
2010-06-08  4:19       ` [PATCH 04/24] Revert "pnfs: Add function to set up O_DIRECT I/O" Fred Isaman
2010-06-08  4:19         ` [PATCH 05/24] SQUASHME: ensure pnfs_update_lseg clears lsegp on error Fred Isaman
2010-06-08  4:19           ` [PATCH 06/24] pnfs: filelayout: clean and breakup nfs4_pnfs_dserver_get Fred Isaman
2010-06-08  4:19             ` [PATCH 07/24] pnfs: filelayout: remove some dead code from filelayout_commit Fred Isaman
2010-06-08  4:19               ` [PATCH 08/24] pnfs: remove PNFS_LAYOUTGET_ON_OPEN Fred Isaman
2010-06-08  4:19                 ` [PATCH 09/24] pnfs: track the number of outstanding commits Fred Isaman
2010-06-08  4:19                   ` [PATCH 10/24] pnfs_submit: mandate basic io path operations for layout drivers Fred Isaman
2010-06-08  4:19                     ` [PATCH 11/24] pnfs_submit: expose pnfs_update_layout, put_lseg, and get_lseg functions Fred Isaman
2010-06-08  4:19                       ` [PATCH 12/24] pnfs_submit: stash and refcount lseg in read path Fred Isaman
2010-06-08  4:19                         ` [PATCH 13/24] pnfs_submit: read path changeover Fred Isaman
2010-06-08  4:19                           ` [PATCH 14/24] pnfs_submit: use fsdata to pass lseg Fred Isaman
2010-06-08  4:19                             ` [PATCH 15/24] pnfs_submit: stash and refcount lseg in write path Fred Isaman
2010-06-08  4:19                               ` [PATCH 16/24] pnfs_submit: remove pnfs_file_operations Fred Isaman
2010-06-08  4:19                                 ` [PATCH 17/24] pnfs_submit: remove pnfs_update_layout_commit Fred Isaman
2010-06-08  4:19                                   ` [PATCH 18/24] pnfs_submit: remove pnfs_writepages LAYOUTGET invocation Fred Isaman
2010-06-08  4:19                                     ` [PATCH 19/24] pnfs: export some commit error handling for use by layout drivers Fred Isaman
2010-06-08  4:19                                       ` [PATCH 20/24] pnfs_submit: API change: remove pnfs_commit layoutget invocation Fred Isaman
2010-06-08  4:19                                         ` [PATCH 21/24] pnfs_submit: filelayout: rewrite filelayout_commit to use new API Fred Isaman
2010-06-08  4:19                                           ` [PATCH 22/24] pnfs_submit: remove unecessary pnfs_fl_call_data field pnfs_client Fred Isaman
2010-06-08  4:19                                             ` [PATCH 23/24] pnfs_submit: remove unecessary pnfs_fl_call_data field commit_through_mds Fred Isaman
2010-06-08  4:19                                               ` [PATCH 24/24] pnfs_submit: pnfs_update_layout can return void Fred Isaman
2010-06-09  9:09                                         ` [PATCH 20/24] pnfs_submit: API change: remove pnfs_commit layoutget invocation Benny Halevy
2010-06-09 12:21                                           ` Fred Isaman
2010-06-09 15:12                                             ` Boaz Harrosh
2010-06-09 15:15                                               ` [PATCH] FIXME: pnfs-obj: Short circuit the objlayout_commit to be a no-op Boaz Harrosh
2010-06-08  7:34                                 ` [PATCH 16/24] pnfs_submit: remove pnfs_file_operations Christoph Hellwig
2010-06-09 10:38                             ` [PATCH 14/24] pnfs_submit: use fsdata to pass lseg Benny Halevy
2010-06-09 12:08                               ` Fred Isaman
2010-06-10 10:33                                 ` Fred Isaman
2010-06-10 12:45                                   ` Benny Halevy
2010-06-10 12:48                                     ` Benny Halevy
2010-06-10 13:09                                       ` Boaz Harrosh
2010-06-09 19:33                             ` Boaz Harrosh
2010-06-09 19:19                           ` [PATCH 13/24] pnfs_submit: read path changeover Boaz Harrosh
2010-06-09 19:29                             ` Fred Isaman
     [not found]                               ` <AANLkTilecdPbSOJCDkGYH-X25gcZB-1fmBmU9mEpFO_y-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-06-09 19:39                                 ` Boaz Harrosh
2010-06-09 19:46                                   ` Fred Isaman
2010-06-10  6:26                                     ` Boaz Harrosh
2010-06-09 18:58                       ` [PATCH 11/24] pnfs_submit: expose pnfs_update_layout, put_lseg, and get_lseg functions Boaz Harrosh
2010-06-09 19:20                         ` Fred Isaman
2010-06-09 18:18           ` [PATCH 05/24] SQUASHME: ensure pnfs_update_lseg clears lsegp on error Boaz Harrosh
2010-06-09 18:06   ` [PATCH 01/24] Revert "pnfs-nonfilelayout: Prelim support for non-file layout O_DIRECT" Boaz Harrosh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.