[PATCH] Btrfs: improve inode hash function/inode lookup

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] Btrfs: improve inode hash function/inode lookup
@ 2013-10-06 20:53 Filipe David Borba Manana
  2013-10-06 21:22 ` [PATCH v2] " Filipe David Borba Manana
  0 siblings, 1 reply; 7+ messages in thread
From: Filipe David Borba Manana @ 2013-10-06 20:53 UTC (permalink / raw
  To: linux-btrfs; +Cc: Filipe David Borba Manana

Currently the hash value used for adding an inode to the VFS's inode
hash table consists of the plain inode number, which is a 64 bits
integer. This results in hash table buckets (hlist_head lists) with
too many elements for at least 2 important scenarios:

1) When we have many subvolumes. Each subvolume has its own btree
   where its files and directories are added to, and each has its
   own objectid (inode number) namespace. This means that if we have
   N subvolumes, and all have inode number X associated to a file or
   directory, the corresponding inodes all map to the same hash table
   entry, resulting in a bucket (hlist_head list) with N elements;

2) On 32 bits machines. Th VFS hash values are unsigned longs, which
   are 32 bits wide on 32 bits machines, and the inode (objectid)
   numbers are 64 bits unsigned integers. We simply cast the inode
   numbers to hash values, which means that for all inodes with the
   same 32 bits lower half, the same hash bucket is used for all of
   them. For example, all inodes with a number (objectid) between
   0x0000_0000_ffff_ffff and 0xffff_ffff_ffff_ffff will end up in
   the same hash table bucket.

This change ensures the inode's hash value depends both on the
objectid (inode number) and its subvolume's (btree root) objectid.
For 32 bits machines, this change gives better entropy by making
the hash value depend on both the upper and lower 32 bits of the
64 bits hash previously computed.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
---
 fs/btrfs/btrfs_inode.h |   20 ++++++++++++++++++++
 fs/btrfs/disk-io.c     |    2 +-
 fs/btrfs/inode.c       |    6 ++++--
 3 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 71f074e..fbe745e 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -19,6 +19,7 @@
 #ifndef __BTRFS_I__
 #define __BTRFS_I__

+#include <linux/hash.h>
 #include "extent_map.h"
 #include "extent_io.h"
 #include "ordered-data.h"
@@ -179,6 +180,25 @@ static inline struct btrfs_inode *BTRFS_I(struct inode *inode)
 	return container_of(inode, struct btrfs_inode, vfs_inode);
 }

+static inline unsigned long inode_hash(u64 objectid,
+				       const struct btrfs_root *root)
+{
+	u64 h = objectid ^ (root->objectid * GOLDEN_RATIO_PRIME);
+
+#if BITS_PER_LONG == 32
+	h = (h >> 32) ^ (h & 0xffffffff);
+#endif
+
+	return (unsigned long)h;
+}
+
+static inline void btrfs_insert_inode_hash(struct inode *inode)
+{
+	unsigned long h = inode_hash(inode->i_ino, BTRFS_I(inode)->root);
+
+	__insert_inode_hash(inode, h);
+}
+
 static inline u64 btrfs_ino(struct inode *inode)
 {
 	u64 ino = BTRFS_I(inode)->location.objectid;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ade6c0e..d205bdd 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2294,7 +2294,7 @@ int open_ctree(struct super_block *sb,
 	       sizeof(struct btrfs_key));
 	set_bit(BTRFS_INODE_DUMMY,
 		&BTRFS_I(fs_info->btree_inode)->runtime_flags);
-	insert_inode_hash(fs_info->btree_inode);
+	btrfs_insert_inode_hash(fs_info->btree_inode);

 	spin_lock_init(&fs_info->block_group_cache_lock);
 	fs_info->block_group_cache_tree = RB_ROOT;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 13b470c..3c76bf4 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4837,10 +4837,12 @@ static struct inode *btrfs_iget_locked(struct super_block *s,
 {
 	struct inode *inode;
 	struct btrfs_iget_args args;
+	unsigned long hashval = inode_hash(objectid, root);
+
 	args.ino = objectid;
 	args.root = root;

-	inode = iget5_locked(s, objectid, btrfs_find_actor,
+	inode = iget5_locked(s, hashval, btrfs_find_actor,
 			     btrfs_init_locked_inode,
 			     (void *)&args);
 	return inode;
@@ -5460,7 +5462,7 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans,
 				BTRFS_INODE_NODATASUM;
 	}

-	insert_inode_hash(inode);
+	btrfs_insert_inode_hash(inode);
 	inode_tree_add(inode);

 	trace_btrfs_inode_new(inode);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2] Btrfs: improve inode hash function/inode lookup
  2013-10-06 20:53 [PATCH] Btrfs: improve inode hash function/inode lookup Filipe David Borba Manana
@ 2013-10-06 21:22 ` Filipe David Borba Manana
  2013-10-11 11:42   ` David Sterba
  0 siblings, 1 reply; 7+ messages in thread
From: Filipe David Borba Manana @ 2013-10-06 21:22 UTC (permalink / raw
  To: linux-btrfs; +Cc: Filipe David Borba Manana

Currently the hash value used for adding an inode to the VFS's inode
hash table consists of the plain inode number, which is a 64 bits
integer. This results in hash table buckets (hlist_head lists) with
too many elements for at least 2 important scenarios:

1) When we have many subvolumes. Each subvolume has its own btree
   where its files and directories are added to, and each has its
   own objectid (inode number) namespace. This means that if we have
   N subvolumes, and all have inode number X associated to a file or
   directory, the corresponding inodes all map to the same hash table
   entry, resulting in a bucket (hlist_head list) with N elements;

2) On 32 bits machines. Th VFS hash values are unsigned longs, which
   are 32 bits wide on 32 bits machines, and the inode (objectid)
   numbers are 64 bits unsigned integers. We simply cast the inode
   numbers to hash values, which means that for all inodes with the
   same 32 bits lower half, the same hash bucket is used for all of
   them. For example, all inodes with a number (objectid) between
   0x0000_0000_ffff_ffff and 0xffff_ffff_ffff_ffff will end up in
   the same hash table bucket.

This change ensures the inode's hash value depends both on the
objectid (inode number) and its subvolume's (btree root) objectid.
For 32 bits machines, this change gives better entropy by making
the hash value depend on both the upper and lower 32 bits of the
64 bits hash previously computed.

Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
---

V2: Renamed function inode_hash() to btrfs_inode_hash().

 fs/btrfs/btrfs_inode.h |   20 ++++++++++++++++++++
 fs/btrfs/disk-io.c     |    2 +-
 fs/btrfs/inode.c       |    6 ++++--
 3 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index 71f074e..ac0b39d 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -19,6 +19,7 @@
 #ifndef __BTRFS_I__
 #define __BTRFS_I__

+#include <linux/hash.h>
 #include "extent_map.h"
 #include "extent_io.h"
 #include "ordered-data.h"
@@ -179,6 +180,25 @@ static inline struct btrfs_inode *BTRFS_I(struct inode *inode)
 	return container_of(inode, struct btrfs_inode, vfs_inode);
 }

+static inline unsigned long btrfs_inode_hash(u64 objectid,
+					     const struct btrfs_root *root)
+{
+	u64 h = objectid ^ (root->objectid * GOLDEN_RATIO_PRIME);
+
+#if BITS_PER_LONG == 32
+	h = (h >> 32) ^ (h & 0xffffffff);
+#endif
+
+	return (unsigned long)h;
+}
+
+static inline void btrfs_insert_inode_hash(struct inode *inode)
+{
+	unsigned long h = btrfs_inode_hash(inode->i_ino, BTRFS_I(inode)->root);
+
+	__insert_inode_hash(inode, h);
+}
+
 static inline u64 btrfs_ino(struct inode *inode)
 {
 	u64 ino = BTRFS_I(inode)->location.objectid;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ade6c0e..d205bdd 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2294,7 +2294,7 @@ int open_ctree(struct super_block *sb,
 	       sizeof(struct btrfs_key));
 	set_bit(BTRFS_INODE_DUMMY,
 		&BTRFS_I(fs_info->btree_inode)->runtime_flags);
-	insert_inode_hash(fs_info->btree_inode);
+	btrfs_insert_inode_hash(fs_info->btree_inode);

 	spin_lock_init(&fs_info->block_group_cache_lock);
 	fs_info->block_group_cache_tree = RB_ROOT;
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 13b470c..cc3de9d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4837,10 +4837,12 @@ static struct inode *btrfs_iget_locked(struct super_block *s,
 {
 	struct inode *inode;
 	struct btrfs_iget_args args;
+	unsigned long hashval = btrfs_inode_hash(objectid, root);
+
 	args.ino = objectid;
 	args.root = root;

-	inode = iget5_locked(s, objectid, btrfs_find_actor,
+	inode = iget5_locked(s, hashval, btrfs_find_actor,
 			     btrfs_init_locked_inode,
 			     (void *)&args);
 	return inode;
@@ -5460,7 +5462,7 @@ static struct inode *btrfs_new_inode(struct btrfs_trans_handle *trans,
 				BTRFS_INODE_NODATASUM;
 	}

-	insert_inode_hash(inode);
+	btrfs_insert_inode_hash(inode);
 	inode_tree_add(inode);

 	trace_btrfs_inode_new(inode);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] Btrfs: improve inode hash function/inode lookup
  2013-10-06 21:22 ` [PATCH v2] " Filipe David Borba Manana
@ 2013-10-11 11:42   ` David Sterba
  2013-10-11 17:20     ` Filipe David Manana
  0 siblings, 1 reply; 7+ messages in thread
From: David Sterba @ 2013-10-11 11:42 UTC (permalink / raw
  To: Filipe David Borba Manana; +Cc: linux-btrfs

On Sun, Oct 06, 2013 at 10:22:33PM +0100, Filipe David Borba Manana wrote:
> 2) On 32 bits machines. Th VFS hash values are unsigned longs, which
>    are 32 bits wide on 32 bits machines, and the inode (objectid)
>    numbers are 64 bits unsigned integers. We simply cast the inode
>    numbers to hash values, which means that for all inodes with the
>    same 32 bits lower half, the same hash bucket is used for all of
>    them. For example, all inodes with a number (objectid) between
>    0x0000_0000_ffff_ffff and 0xffff_ffff_ffff_ffff will end up in
>    the same hash table bucket.

Well, inode number that does not fit into 32 bits on a 32 bit machine
causes other problems. And subvolume ids that do not fit into 32 bits
cannot be stored in radix tree.

It would be safer to refuse creating/accessing inode/subvolume with
nubmer that does not fit into 32bits.

david

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] Btrfs: improve inode hash function/inode lookup
  2013-10-11 11:42   ` David Sterba
@ 2013-10-11 17:20     ` Filipe David Manana
  2013-10-21 23:21       ` David Sterba
  0 siblings, 1 reply; 7+ messages in thread
From: Filipe David Manana @ 2013-10-11 17:20 UTC (permalink / raw
  To: dsterba@suse.cz, Filipe David Borba Manana,
	linux-btrfs@vger.kernel.org

On Fri, Oct 11, 2013 at 12:42 PM, David Sterba <dsterba@suse.cz> wrote:
> On Sun, Oct 06, 2013 at 10:22:33PM +0100, Filipe David Borba Manana wrote:
>> 2) On 32 bits machines. Th VFS hash values are unsigned longs, which
>>    are 32 bits wide on 32 bits machines, and the inode (objectid)
>>    numbers are 64 bits unsigned integers. We simply cast the inode
>>    numbers to hash values, which means that for all inodes with the
>>    same 32 bits lower half, the same hash bucket is used for all of
>>    them. For example, all inodes with a number (objectid) between
>>    0x0000_0000_ffff_ffff and 0xffff_ffff_ffff_ffff will end up in
>>    the same hash table bucket.
>
> Well, inode number that does not fit into 32 bits on a 32 bit machine
> causes other problems. And subvolume ids that do not fit into 32 bits
> cannot be stored in radix tree.
>
> It would be safer to refuse creating/accessing inode/subvolume with
> nubmer that does not fit into 32bits.

Good point David. However it would be a rather radical behaviour change imho.
I think it should be a separate change if there are no objections
against such change.

>
> david



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] Btrfs: improve inode hash function/inode lookup
  2013-10-11 17:20     ` Filipe David Manana
@ 2013-10-21 23:21       ` David Sterba
  2013-10-21 23:27         ` Filipe David Manana
  0 siblings, 1 reply; 7+ messages in thread
From: David Sterba @ 2013-10-21 23:21 UTC (permalink / raw
  To: Filipe David Manana; +Cc: linux-btrfs@vger.kernel.org

On Fri, Oct 11, 2013 at 06:20:16PM +0100, Filipe David Manana wrote:
> Good point David. However it would be a rather radical behaviour change imho.

A radical step to prevent this practictly impossible but nasty bug to
happen :)

> I think it should be a separate change if there are no objections
> against such change.

It is a separate change that makes your patch obsolete.

david

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] Btrfs: improve inode hash function/inode lookup
  2013-10-21 23:21       ` David Sterba
@ 2013-10-21 23:27         ` Filipe David Manana
  2013-10-22 17:04           ` David Sterba
  0 siblings, 1 reply; 7+ messages in thread
From: Filipe David Manana @ 2013-10-21 23:27 UTC (permalink / raw
  To: dsterba@suse.cz, Filipe David Manana, linux-btrfs@vger.kernel.org

On Tue, Oct 22, 2013 at 12:21 AM, David Sterba <dsterba@suse.cz> wrote:
> On Fri, Oct 11, 2013 at 06:20:16PM +0100, Filipe David Manana wrote:
>> Good point David. However it would be a rather radical behaviour change imho.
>
> A radical step to prevent this practictly impossible but nasty bug to
> happen :)
>
>> I think it should be a separate change if there are no objections
>> against such change.
>
> It is a separate change that makes your patch obsolete.

Except the part of making the hash value depend on both object id and
root id, no?

>
> david



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2] Btrfs: improve inode hash function/inode lookup
  2013-10-21 23:27         ` Filipe David Manana
@ 2013-10-22 17:04           ` David Sterba
  0 siblings, 0 replies; 7+ messages in thread
From: David Sterba @ 2013-10-22 17:04 UTC (permalink / raw
  To: Filipe David Manana; +Cc: dsterba@suse.cz, linux-btrfs@vger.kernel.org

On Tue, Oct 22, 2013 at 12:27:21AM +0100, Filipe David Manana wrote:
> > It is a separate change that makes your patch obsolete.
> 
> Except the part of making the hash value depend on both object id and
> root id, no?

I've read the patch again, yes, case 1 makes sense.

david

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-10-22 17:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-06 20:53 [PATCH] Btrfs: improve inode hash function/inode lookup Filipe David Borba Manana
2013-10-06 21:22 ` [PATCH v2] " Filipe David Borba Manana
2013-10-11 11:42   ` David Sterba
2013-10-11 17:20     ` Filipe David Manana
2013-10-21 23:21       ` David Sterba
2013-10-21 23:27         ` Filipe David Manana
2013-10-22 17:04           ` David Sterba

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.