* NFS root lockups with -next 20110113
@ 2011-01-13 12:06 Mark Brown
2011-01-13 13:22 ` J. R. Okajima
0 siblings, 1 reply; 15+ messages in thread
From: Mark Brown @ 2011-01-13 12:06 UTC (permalink / raw
To: Trond Myklebust, Nick Piggin; +Cc: linux-nfs, linux-kernel, linux-fsdevel
With -next 20110113 I'm experiencing lockups shortly after userspace
starts when booting with my root filesystem on NFS, log below. I can
boot into /bin/sh but running real userspace triggers this very easily.
This was introduced sometime this week.
I've not bisected or otherwise investigated much yet, but I do notice
code added recently by Nick in "fs: rcu-walk for path lookup" showing up
in the backtrace so including him in the CCs.
Tail of the console log:
INIT: version 2.86 booting
[ 15.250000] BUG: spinlock recursion on CPU#0, rc/1151
[ 15.250000] lock: c74ad678, .magic: dead4ead, .owner: rc/1151, .owner_cpu: 0
[ 15.260000] [<c0035118>] (unwind_backtrace+0x0/0xe4) from [<c0190454>] (do_raw_spin_lock+0x48/0x148)
[ 15.260000] [<c0190454>] (do_raw_spin_lock+0x48/0x148) from [<c00c89f0>] (nameidata_dentry_drop_rcu+0x84/0x180)
[ 15.270000] [<c00c89f0>] (nameidata_dentry_drop_rcu+0x84/0x180) from [<c00c8b1c>] (d_revalidate+0x30/0x58)
[ 15.280000] [<c00c8b1c>] (d_revalidate+0x30/0x58) from [<c00cb82c>] (link_path_walk+0xad8/0xb00)
[ 15.290000] [<c00cb82c>] (link_path_walk+0xad8/0xb00) from [<c00cba4c>] (do_path_lookup+0x44/0xcc)
[ 15.300000] [<c00cba4c>] (do_path_lookup+0x44/0xcc) from [<c00cc3f4>] (user_path_at+0x58/0x90)
[ 15.310000] [<c00cc3f4>] (user_path_at+0x58/0x90) from [<c00c4210>] (vfs_fstatat+0x28/0x54)
[ 15.320000] [<c00c4210>] (vfs_fstatat+0x28/0x54) from [<c00c431c>] (sys_stat64+0x14/0x34)
[ 15.330000] [<c00c431c>] (sys_stat64+0x14/0x34) from [<c002f4e0>] (ret_fast_syscall+0x0/0x30)
[ 25.830000] BUG: spinlock lockup on CPU#0, rc/1151, c74ad678
[ 25.830000] [<c0035118>] (unwind_backtrace+0x0/0xe4) from [<c0190518>] (do_raw_spin_lock+0x10c/0x148)
[ 25.840000] [<c0190518>] (do_raw_spin_lock+0x10c/0x148) from [<c00c89f0>] (nameidata_dentry_drop_rcu+0x84/0x180)
[ 25.850000] [<c00c89f0>] (nameidata_dentry_drop_rcu+0x84/0x180) from [<c00c8b1c>] (d_revalidate+0x30/0x58)
[ 25.860000] [<c00c8b1c>] (d_revalidate+0x30/0x58) from [<c00cb82c>] (link_path_walk+0xad8/0xb00)
[ 25.870000] [<c00cb82c>] (link_path_walk+0xad8/0xb00) from [<c00cba4c>] (do_path_lookup+0x44/0xcc)
[ 25.870000] [<c00cba4c>] (do_path_lookup+0x44/0xcc) from [<c00cc3f4>] (user_path_at+0x58/0x90)
[ 25.880000] [<c00cc3f4>] (user_path_at+0x58/0x90) from [<c00c4210>] (vfs_fstatat+0x28/0x54)
[ 25.890000] [<c00c4210>] (vfs_fstatat+0x28/0x54) from [<c00c431c>] (sys_stat64+0x14/0x34)
[ 25.900000] [<c00c431c>] (sys_stat64+0x14/0x34) from [<c002f4e0>] (ret_fast_syscall+0x0/0x30)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: NFS root lockups with -next 20110113
2011-01-13 12:06 NFS root lockups with -next 20110113 Mark Brown
@ 2011-01-13 13:22 ` J. R. Okajima
2011-01-13 13:28 ` Santosh Shilimkar
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: J. R. Okajima @ 2011-01-13 13:22 UTC (permalink / raw
To: Mark Brown
Cc: Trond Myklebust, Santosh Shilimkar, Nick Piggin, linux-nfs,
linux-kernel, linux-fsdevel
Mark Brown:
> With -next 20110113 I'm experiencing lockups shortly after userspace
> starts when booting with my root filesystem on NFS, log below. I can
> boot into /bin/sh but running real userspace triggers this very easily.
> This was introduced sometime this week.
>
> I've not bisected or otherwise investigated much yet, but I do notice
> code added recently by Nick in "fs: rcu-walk for path lookup" showing up
> in the backtrace so including him in the CCs.
This and a report from Santosh Shilimkar look like the same problem
which I reported, and I am testing this patch. If you can, please try it
too.
Note: Of course this is not offcial fix.
J. R. Okajima
diff --git a/fs/namei.c b/fs/namei.c
index 5bb7588..51d052f 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -480,6 +480,7 @@ static int nameidata_dentry_drop_rcu(struct nameidata *nd, struct dentry *dentry
{
struct fs_struct *fs = current->fs;
struct dentry *parent = nd->path.dentry;
+ int isroot;
BUG_ON(!(nd->flags & LOOKUP_RCU));
if (nd->root.mnt) {
@@ -489,18 +490,22 @@ static int nameidata_dentry_drop_rcu(struct nameidata *nd, struct dentry *dentry
goto err_root;
}
spin_lock(&parent->d_lock);
- spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
- if (!__d_rcu_to_refcount(dentry, nd->seq))
- goto err;
+ isroot = IS_ROOT(dentry);
+ if (!isroot) {
+ spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
+ if (!__d_rcu_to_refcount(dentry, nd->seq))
+ goto err;
+ }
/*
* If the sequence check on the child dentry passed, then the child has
* not been removed from its parent. This means the parent dentry must
* be valid and able to take a reference at this point.
*/
- BUG_ON(!IS_ROOT(dentry) && dentry->d_parent != parent);
+ BUG_ON(!isroot && dentry->d_parent != parent);
BUG_ON(!parent->d_count);
parent->d_count++;
- spin_unlock(&dentry->d_lock);
+ if (!isroot)
+ spin_unlock(&dentry->d_lock);
spin_unlock(&parent->d_lock);
if (nd->root.mnt) {
path_get(&nd->root);
@@ -513,7 +518,8 @@ static int nameidata_dentry_drop_rcu(struct nameidata *nd, struct dentry *dentry
nd->flags &= ~LOOKUP_RCU;
return 0;
err:
- spin_unlock(&dentry->d_lock);
+ if (!isroot)
+ spin_unlock(&dentry->d_lock);
spin_unlock(&parent->d_lock);
err_root:
if (nd->root.mnt)
^ permalink raw reply related [flat|nested] 15+ messages in thread
* RE: NFS root lockups with -next 20110113
2011-01-13 13:22 ` J. R. Okajima
@ 2011-01-13 13:28 ` Santosh Shilimkar
2011-01-13 13:45 ` J. R. Okajima
2011-01-13 13:35 ` NFS root lockups with -next 20110113 Mark Brown
2011-01-13 13:41 ` Santosh Shilimkar
2 siblings, 1 reply; 15+ messages in thread
From: Santosh Shilimkar @ 2011-01-13 13:28 UTC (permalink / raw
To: J. R. Okajima, Mark Brown
Cc: Trond Myklebust, Nick Piggin, linux-nfs, linux-kernel,
linux-fsdevel
> -----Original Message-----
> From: J. R. Okajima [mailto:hooanon05@yahoo.co.jp]
> Sent: Thursday, January 13, 2011 6:52 PM
> To: Mark Brown
> Cc: Trond Myklebust; Santosh Shilimkar; Nick Piggin; linux-
> nfs@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> fsdevel@vger.kernel.org
> Subject: Re: NFS root lockups with -next 20110113
>
>
> Mark Brown:
> > With -next 20110113 I'm experiencing lockups shortly after
> userspace
> > starts when booting with my root filesystem on NFS, log below. I
> can
> > boot into /bin/sh but running real userspace triggers this very
> easily.
> > This was introduced sometime this week.
> >
> > I've not bisected or otherwise investigated much yet, but I do
> notice
> > code added recently by Nick in "fs: rcu-walk for path lookup"
> showing up
> > in the backtrace so including him in the CCs.
>
> This and a report from Santosh Shilimkar look like the same problem
> which I reported, and I am testing this patch. If you can, please
> try it
> too.
> Note: Of course this is not offcial fix.
>
It works. My board booted with NFS with below patch
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 5bb7588..51d052f 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -480,6 +480,7 @@ static int nameidata_dentry_drop_rcu(struct
> nameidata *nd, struct dentry *dentry
> {
> struct fs_struct *fs = current->fs;
> struct dentry *parent = nd->path.dentry;
> + int isroot;
>
> BUG_ON(!(nd->flags & LOOKUP_RCU));
> if (nd->root.mnt) {
> @@ -489,18 +490,22 @@ static int nameidata_dentry_drop_rcu(struct
> nameidata *nd, struct dentry *dentry
> goto err_root;
> }
> spin_lock(&parent->d_lock);
> - spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> - if (!__d_rcu_to_refcount(dentry, nd->seq))
> - goto err;
> + isroot = IS_ROOT(dentry);
> + if (!isroot) {
> + spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> + if (!__d_rcu_to_refcount(dentry, nd->seq))
> + goto err;
> + }
> /*
> * If the sequence check on the child dentry passed, then the
> child has
> * not been removed from its parent. This means the parent
> dentry must
> * be valid and able to take a reference at this point.
> */
> - BUG_ON(!IS_ROOT(dentry) && dentry->d_parent != parent);
> + BUG_ON(!isroot && dentry->d_parent != parent);
> BUG_ON(!parent->d_count);
> parent->d_count++;
> - spin_unlock(&dentry->d_lock);
> + if (!isroot)
> + spin_unlock(&dentry->d_lock);
> spin_unlock(&parent->d_lock);
> if (nd->root.mnt) {
> path_get(&nd->root);
> @@ -513,7 +518,8 @@ static int nameidata_dentry_drop_rcu(struct
> nameidata *nd, struct dentry *dentry
> nd->flags &= ~LOOKUP_RCU;
> return 0;
> err:
> - spin_unlock(&dentry->d_lock);
> + if (!isroot)
> + spin_unlock(&dentry->d_lock);
> spin_unlock(&parent->d_lock);
> err_root:
> if (nd->root.mnt)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: NFS root lockups with -next 20110113
2011-01-13 13:22 ` J. R. Okajima
2011-01-13 13:28 ` Santosh Shilimkar
@ 2011-01-13 13:35 ` Mark Brown
2011-01-13 13:41 ` Santosh Shilimkar
2 siblings, 0 replies; 15+ messages in thread
From: Mark Brown @ 2011-01-13 13:35 UTC (permalink / raw
To: J. R. Okajima
Cc: Trond Myklebust, Santosh Shilimkar, Nick Piggin, linux-nfs,
linux-kernel, linux-fsdevel
On Thu, Jan 13, 2011 at 10:22:07PM +0900, J. R. Okajima wrote:
> This and a report from Santosh Shilimkar look like the same problem
> which I reported, and I am testing this patch. If you can, please try it
> too.
> Note: Of course this is not offcial fix.
Seems to do the trick:
Tested-by: Mark Brown <broonie@opensource.wolsfonmicro.com>
^ permalink raw reply [flat|nested] 15+ messages in thread
* RE: NFS root lockups with -next 20110113
2011-01-13 13:22 ` J. R. Okajima
2011-01-13 13:28 ` Santosh Shilimkar
2011-01-13 13:35 ` NFS root lockups with -next 20110113 Mark Brown
@ 2011-01-13 13:41 ` Santosh Shilimkar
2 siblings, 0 replies; 15+ messages in thread
From: Santosh Shilimkar @ 2011-01-13 13:41 UTC (permalink / raw
To: J. R. Okajima, Mark Brown
Cc: Trond Myklebust, Nick Piggin, linux-nfs, linux-kernel,
linux-fsdevel, linux-arm-kernel
(+ 'linux-arm' since same problem was getting discussed
in another thread)
> -----Original Message-----
> From: Santosh Shilimkar [mailto:santosh.shilimkar@ti.com]
> Sent: Thursday, January 13, 2011 6:59 PM
> To: 'J. R. Okajima'; 'Mark Brown'
> Cc: 'Trond Myklebust'; 'Nick Piggin'; 'linux-nfs@vger.kernel.org';
> 'linux-kernel@vger.kernel.org'; 'linux-fsdevel@vger.kernel.org'
> Subject: RE: NFS root lockups with -next 20110113
>
> > -----Original Message-----
> > From: J. R. Okajima [mailto:hooanon05@yahoo.co.jp]
> > Sent: Thursday, January 13, 2011 6:52 PM
> > To: Mark Brown
> > Cc: Trond Myklebust; Santosh Shilimkar; Nick Piggin; linux-
> > nfs@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> > fsdevel@vger.kernel.org
> > Subject: Re: NFS root lockups with -next 20110113
> >
> >
> > Mark Brown:
> > > With -next 20110113 I'm experiencing lockups shortly after
> > userspace
> > > starts when booting with my root filesystem on NFS, log below.
> I
> > can
> > > boot into /bin/sh but running real userspace triggers this very
> > easily.
> > > This was introduced sometime this week.
> > >
> > > I've not bisected or otherwise investigated much yet, but I do
> > notice
> > > code added recently by Nick in "fs: rcu-walk for path lookup"
> > showing up
> > > in the backtrace so including him in the CCs.
> >
> > This and a report from Santosh Shilimkar look like the same
> problem
> > which I reported, and I am testing this patch. If you can, please
> > try it
> > too.
> > Note: Of course this is not offcial fix.
> >
>
> It works. My board booted with NFS with below patch
>
>
> >
> > diff --git a/fs/namei.c b/fs/namei.c
> > index 5bb7588..51d052f 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -480,6 +480,7 @@ static int nameidata_dentry_drop_rcu(struct
> > nameidata *nd, struct dentry *dentry
> > {
> > struct fs_struct *fs = current->fs;
> > struct dentry *parent = nd->path.dentry;
> > + int isroot;
> >
> > BUG_ON(!(nd->flags & LOOKUP_RCU));
> > if (nd->root.mnt) {
> > @@ -489,18 +490,22 @@ static int nameidata_dentry_drop_rcu(struct
> > nameidata *nd, struct dentry *dentry
> > goto err_root;
> > }
> > spin_lock(&parent->d_lock);
> > - spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> > - if (!__d_rcu_to_refcount(dentry, nd->seq))
> > - goto err;
> > + isroot = IS_ROOT(dentry);
> > + if (!isroot) {
> > + spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> > + if (!__d_rcu_to_refcount(dentry, nd->seq))
> > + goto err;
> > + }
> > /*
> > * If the sequence check on the child dentry passed, then the
> > child has
> > * not been removed from its parent. This means the parent
> > dentry must
> > * be valid and able to take a reference at this point.
> > */
> > - BUG_ON(!IS_ROOT(dentry) && dentry->d_parent != parent);
> > + BUG_ON(!isroot && dentry->d_parent != parent);
> > BUG_ON(!parent->d_count);
> > parent->d_count++;
> > - spin_unlock(&dentry->d_lock);
> > + if (!isroot)
> > + spin_unlock(&dentry->d_lock);
> > spin_unlock(&parent->d_lock);
> > if (nd->root.mnt) {
> > path_get(&nd->root);
> > @@ -513,7 +518,8 @@ static int nameidata_dentry_drop_rcu(struct
> > nameidata *nd, struct dentry *dentry
> > nd->flags &= ~LOOKUP_RCU;
> > return 0;
> > err:
> > - spin_unlock(&dentry->d_lock);
> > + if (!isroot)
> > + spin_unlock(&dentry->d_lock);
> > spin_unlock(&parent->d_lock);
> > err_root:
> > if (nd->root.mnt)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: NFS root lockups with -next 20110113
2011-01-13 13:28 ` Santosh Shilimkar
@ 2011-01-13 13:45 ` J. R. Okajima
2011-01-14 3:59 ` Nick Piggin
0 siblings, 1 reply; 15+ messages in thread
From: J. R. Okajima @ 2011-01-13 13:45 UTC (permalink / raw
To: Santosh Shilimkar
Cc: Mark Brown, Trond Myklebust, Nick Piggin, linux-nfs, linux-kernel,
linux-fsdevel
Santosh Shilimkar:
> It works. My board booted with NFS with below patch
----------------------------------------
Mark Brown;
> Seems to do the trick:
>
> Tested-by: Mark Brown <broonie@opensource.wolsfonmicro.com>
Thank you so quick tests.
When I post the patch to be mainlined, I will add Tested-by for both of
you. But Nick Piggin may take another approach.
J. R. Okajima
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: NFS root lockups with -next 20110113
2011-01-13 13:45 ` J. R. Okajima
@ 2011-01-14 3:59 ` Nick Piggin
2011-01-14 4:41 ` J. R. Okajima
2011-01-19 6:43 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) J. R. Okajima
0 siblings, 2 replies; 15+ messages in thread
From: Nick Piggin @ 2011-01-14 3:59 UTC (permalink / raw
To: J. R. Okajima
Cc: Santosh Shilimkar, Mark Brown, Trond Myklebust, Nick Piggin,
linux-nfs, linux-kernel, linux-fsdevel
On Fri, Jan 14, 2011 at 12:45 AM, J. R. Okajima <hooanon05@yahoo.co.jp> wrote:
>
> Santosh Shilimkar:
>> It works. My board booted with NFS with below patch
>
> ----------------------------------------
>
> Mark Brown;
>> Seems to do the trick:
>>
>> Tested-by: Mark Brown <broonie@opensource.wolsfonmicro.com>
>
> Thank you so quick tests.
> When I post the patch to be mainlined, I will add Tested-by for both of
> you. But Nick Piggin may take another approach.
Thanks for your help, can you see how I've fixed it in my vfs-scale
tree? What do you think?
Thanks,
Nick
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: NFS root lockups with -next 20110113
2011-01-14 3:59 ` Nick Piggin
@ 2011-01-14 4:41 ` J. R. Okajima
2011-01-19 6:43 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) J. R. Okajima
1 sibling, 0 replies; 15+ messages in thread
From: J. R. Okajima @ 2011-01-14 4:41 UTC (permalink / raw
To: Nick Piggin
Cc: Santosh Shilimkar, Mark Brown, Trond Myklebust, Nick Piggin,
linux-nfs, linux-kernel, linux-fsdevel
Nick Piggin:
> Thanks for your help, can you see how I've fixed it in my vfs-scale
> tree? What do you think?
I will.
It may take some time.
J. R. Okajima
^ permalink raw reply [flat|nested] 15+ messages in thread
* vfs-scale, general questions (Re: NFS root lockups with -next 20110113)
2011-01-14 3:59 ` Nick Piggin
2011-01-14 4:41 ` J. R. Okajima
@ 2011-01-19 6:43 ` J. R. Okajima
2011-01-19 7:21 ` Nick Piggin
2011-02-11 3:49 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) Ian Kent
1 sibling, 2 replies; 15+ messages in thread
From: J. R. Okajima @ 2011-01-19 6:43 UTC (permalink / raw
To: Nick Piggin
Cc: Santosh Shilimkar, Mark Brown, Trond Myklebust, Nick Piggin,
linux-nfs, linux-kernel, linux-fsdevel
Hi,
Nick Piggin:
> Thanks for your help, can you see how I've fixed it in my vfs-scale
> tree? What do you think?
Your fix is great. I have no objection at all.
Other than the fix, here are more generic questions about vfs-scale work.
I am happy if you reply when you have time.
- getcwd(2) needs d_lock?
It acquires rename_lock and then tests whether the pwd is removed by
d_unhashed(). If a race condition between vfs_rename_dir() which may
unhash/rehash the dentry happens, then getcwd() may return the wrong
result due to unprotected d_unhashed() call, I am afraid. rename_lock
doesn't help this case.
- what is the right order of dget() and mntget()?
If I remember correctly, someone said "mntget() first and then
dget(). when putting, do in reverse" in the discussion when
path_{get,put}() were born. So it is called "the right order" in the
commit log.
It was many years ago. Is it still true? And should rcu-walk follow it
too? The current implementation doesn't seem to care about this order.
- d_move() and rename_lock
This may be out of rcu-walk work, but rename_lock in d_move() looks
outstanding since it surely kills concurrency. It is a pity that two
unrelated but concurrent d_move-s are serialized when we run rename(2)
on two different filesystems. Even if all of dentries, parents and
hash buckets are different from each other, d_move() never run
concurrently.
J. R. Okajima
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: vfs-scale, general questions (Re: NFS root lockups with -next 20110113)
2011-01-19 6:43 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) J. R. Okajima
@ 2011-01-19 7:21 ` Nick Piggin
2011-01-20 9:05 ` vfs-scale, general questions J. R. Okajima
2011-02-11 3:49 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) Ian Kent
1 sibling, 1 reply; 15+ messages in thread
From: Nick Piggin @ 2011-01-19 7:21 UTC (permalink / raw
To: J. R. Okajima
Cc: Santosh Shilimkar, Mark Brown, Trond Myklebust, Nick Piggin,
linux-nfs, linux-kernel, linux-fsdevel
On Wed, Jan 19, 2011 at 5:43 PM, J. R. Okajima <hooanon05@yahoo.co.jp> wrote:
>
> Hi,
>
> Nick Piggin:
>> Thanks for your help, can you see how I've fixed it in my vfs-scale
>> tree? What do you think?
>
> Your fix is great. I have no objection at all.
> Other than the fix, here are more generic questions about vfs-scale work.
> I am happy if you reply when you have time.
Thanks for reviewing.
> - getcwd(2) needs d_lock?
> It acquires rename_lock and then tests whether the pwd is removed by
> d_unhashed(). If a race condition between vfs_rename_dir() which may
> unhash/rehash the dentry happens, then getcwd() may return the wrong
> result due to unprotected d_unhashed() call, I am afraid. rename_lock
> doesn't help this case.
We have the lock in write mode there, so it should exclude that
particular race. But I need to take another look at this code I
think, I'm not sure it's completely right, so I would appreciate reviews.
A while back I had some extra checks in there and would restart
the entire reverse walk in case of races... but need to think about
it.
> - what is the right order of dget() and mntget()?
> If I remember correctly, someone said "mntget() first and then
> dget(). when putting, do in reverse" in the discussion when
> path_{get,put}() were born. So it is called "the right order" in the
> commit log.
> It was many years ago. Is it still true? And should rcu-walk follow it
> too? The current implementation doesn't seem to care about this order.
Well dget and mntget is not a problem, because we can only do
mntget while already guaranteeing a reference on the mount, and
only dget when already guaranteeing a ref on the dentry (and mount).
But dput must happen before mntput so you don't have dentry ref
without mnt ref. Can you point out where rcu-walk does this wrongly?
> - d_move() and rename_lock
> This may be out of rcu-walk work, but rename_lock in d_move() looks
> outstanding since it surely kills concurrency. It is a pity that two
> unrelated but concurrent d_move-s are serialized when we run rename(2)
> on two different filesystems. Even if all of dentries, parents and
> hash buckets are different from each other, d_move() never run
> concurrently.
Yes I have a patch for that. I made a small hash table of rename locks.
This makes independent same-dir renames scalable. However that was
not the main motivation of the patch. On a really big POWER7 system,
the lookup path goes into a strange bimodal behaviour in the presence
of a relatively small amount of rename activity and sometimes starves
and throughput crashes. Breaking up rename_lock solves that too.
I'll wait until things settle down a bit more and perhaps have a chance
to get more numbers before submitting it (although I can show you when
I get back).
Thanks,
Nick
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: vfs-scale, general questions
2011-01-19 7:21 ` Nick Piggin
@ 2011-01-20 9:05 ` J. R. Okajima
2011-01-20 11:15 ` Miklos Szeredi
0 siblings, 1 reply; 15+ messages in thread
From: J. R. Okajima @ 2011-01-20 9:05 UTC (permalink / raw
To: Nick Piggin; +Cc: Nick Piggin, linux-kernel, linux-fsdevel
(Some of mail destinations are removed since this is not nfs specific
anymore.)
Nick Piggin:
> > - getcwd(2) needs d_lock?
> > =A0It acquires rename_lock and then tests whether the pwd is removed by
> > =A0d_unhashed(). If a race condition between vfs_rename_dir() which may
> > =A0unhash/rehash the dentry happens, then getcwd() may return the wrong
> > =A0result due to unprotected d_unhashed() call, I am afraid. rename_lock
> > =A0doesn't help this case.
>
> We have the lock in write mode there, so it should exclude that
> particular race. But I need to take another look at this code I
> think, I'm not sure it's completely right, so I would appreciate reviews.
You might think about the race around d_move, but what I meant is the
race between d_unlinked and unhash/rehash.
- getcwd return -ENOENT when pwd is unhashed.
- vfs_rename_dir()
+ makes the existing target unhashed.
+ FS ->rename() is called, here let's assume an error happened. so the
target dir is surely alive and reachable, nothing have been changed.
+ vfs_rename_dir() rehashes it again.
During this unhashed period, getcwd(2) may be issued.
And I am afraid it may return an error incorrectly.
About other issues, I will reply when I have time.
J. R. Okajima
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: vfs-scale, general questions
2011-01-20 9:05 ` vfs-scale, general questions J. R. Okajima
@ 2011-01-20 11:15 ` Miklos Szeredi
2011-01-21 6:38 ` J. R. Okajima
0 siblings, 1 reply; 15+ messages in thread
From: Miklos Szeredi @ 2011-01-20 11:15 UTC (permalink / raw
To: J. R. Okajima; +Cc: npiggin, npiggin, linux-kernel, linux-fsdevel
On Thu, 20 Jan 2011, J. R. Okajima wrote:
> You might think about the race around d_move, but what I meant is the
> race between d_unlinked and unhash/rehash.
>
> - getcwd return -ENOENT when pwd is unhashed.
> - vfs_rename_dir()
> + makes the existing target unhashed.
No it doesn't. dentry_unhash() doesn't drop the dentry from the hash
unless it has no other references, cwd being one such ref.
Thanks,
Miklos
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: vfs-scale, general questions
2011-01-20 11:15 ` Miklos Szeredi
@ 2011-01-21 6:38 ` J. R. Okajima
0 siblings, 0 replies; 15+ messages in thread
From: J. R. Okajima @ 2011-01-21 6:38 UTC (permalink / raw
To: Miklos Szeredi; +Cc: npiggin, npiggin, linux-kernel, linux-fsdevel
Miklos Szeredi:
> No it doesn't. dentry_unhash() doesn't drop the dentry from the hash
> unless it has no other references, cwd being one such ref.
I see.
When chdir() completets before vfs_rename_dir(), the dir will not be
unhashed. It is OK.
Even if the dir is unhashed temporary before chdir(), then the lookup
fallback to ref-walk and i_mutex protects. It is OK too.
Thank you.
J. R. Okajima
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: vfs-scale, general questions (Re: NFS root lockups with -next 20110113)
2011-01-19 6:43 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) J. R. Okajima
2011-01-19 7:21 ` Nick Piggin
@ 2011-02-11 3:49 ` Ian Kent
2011-02-13 2:19 ` J. R. Okajima
1 sibling, 1 reply; 15+ messages in thread
From: Ian Kent @ 2011-02-11 3:49 UTC (permalink / raw
To: J. R. Okajima
Cc: Nick Piggin, Santosh Shilimkar, Mark Brown, Trond Myklebust,
Nick Piggin, linux-nfs, linux-kernel, linux-fsdevel
On Wed, 2011-01-19 at 15:43 +0900, J. R. Okajima wrote:
> Hi,
>
> Nick Piggin:
> > Thanks for your help, can you see how I've fixed it in my vfs-scale
> > tree? What do you think?
>
> Your fix is great. I have no objection at all.
> Other than the fix, here are more generic questions about vfs-scale work.
> I am happy if you reply when you have time.
>
> - getcwd(2) needs d_lock?
> It acquires rename_lock and then tests whether the pwd is removed by
> d_unhashed(). If a race condition between vfs_rename_dir() which may
> unhash/rehash the dentry happens, then getcwd() may return the wrong
> result due to unprotected d_unhashed() call, I am afraid. rename_lock
> doesn't help this case.
>
> - what is the right order of dget() and mntget()?
> If I remember correctly, someone said "mntget() first and then
> dget(). when putting, do in reverse" in the discussion when
> path_{get,put}() were born. So it is called "the right order" in the
> commit log.
> It was many years ago. Is it still true? And should rcu-walk follow it
> too? The current implementation doesn't seem to care about this order.
I didn't spot that, where did you see this?
I'm not sure about the get but I fairly sure the dput() has to be before
the mntput() because the shrink_dcache_*() cleanup routines object to
dentrys that have a reference count of more than one.
Ian
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: vfs-scale, general questions (Re: NFS root lockups with -next 20110113)
2011-02-11 3:49 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) Ian Kent
@ 2011-02-13 2:19 ` J. R. Okajima
0 siblings, 0 replies; 15+ messages in thread
From: J. R. Okajima @ 2011-02-13 2:19 UTC (permalink / raw
To: Ian Kent
Cc: Nick Piggin, Santosh Shilimkar, Mark Brown, Trond Myklebust,
Nick Piggin, linux-nfs, linux-kernel, linux-fsdevel
Ian Kent:
> > - what is the right order of dget() and mntget()?
> > If I remember correctly, someone said "mntget() first and then
> > dget(). when putting, do in reverse" in the discussion when
> > path_{get,put}() were born. So it is called "the right order" in the
> > commit log.
> > It was many years ago. Is it still true? And should rcu-walk follow it
> > too? The current implementation doesn't seem to care about this order.
>
> I didn't spot that, where did you see this?
>
> I'm not sure about the get but I fairly sure the dput() has to be before
> the mntput() because the shrink_dcache_*() cleanup routines object to
> dentrys that have a reference count of more than one.
For dget - mntget, there are several such code. For example,
nameidata_dentry_drop_rcu()
{
struct dentry *parent = nd->path.dentry;
:::
parent->d_count++;
spin_unlock(&dentry->d_lock);
spin_unlock(&parent->d_lock);
:::
mntget(nd->path.mnt);
:::
But I am not sure the "get" order is a problem.
Nick Piggin also replied and said dget and mntget is not a problem, and
I replied if I found such "put" order, I would write again.
J. R. Okajima
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2011-02-13 2:19 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-13 12:06 NFS root lockups with -next 20110113 Mark Brown
2011-01-13 13:22 ` J. R. Okajima
2011-01-13 13:28 ` Santosh Shilimkar
2011-01-13 13:45 ` J. R. Okajima
2011-01-14 3:59 ` Nick Piggin
2011-01-14 4:41 ` J. R. Okajima
2011-01-19 6:43 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) J. R. Okajima
2011-01-19 7:21 ` Nick Piggin
2011-01-20 9:05 ` vfs-scale, general questions J. R. Okajima
2011-01-20 11:15 ` Miklos Szeredi
2011-01-21 6:38 ` J. R. Okajima
2011-02-11 3:49 ` vfs-scale, general questions (Re: NFS root lockups with -next 20110113) Ian Kent
2011-02-13 2:19 ` J. R. Okajima
2011-01-13 13:35 ` NFS root lockups with -next 20110113 Mark Brown
2011-01-13 13:41 ` Santosh Shilimkar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).