LKML Archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] tmpfs: Improve tmpfs scalability
@ 2010-05-18 23:34 tim
  2010-05-19  9:27 ` Andi Kleen
  2010-05-21  1:55 ` Hugh Dickins
  0 siblings, 2 replies; 4+ messages in thread
From: tim @ 2010-05-18 23:34 UTC (permalink / raw
  To: linux-kernel; +Cc: Andi Kleen


We created a token jar library implementing
per cpu cache of tokens to avoid lock contentions whenever
we retrieve or return a token to a token jar.  Using this library
with tmpfs, we find Aim7 fserver throughput improved 270% 
on a 4 socket, 32 cores NHM-EX system.

In current implementation of tmpfs, whenever we
get a new page, stat_lock in shmem_sb_info needs to be acquired.  
This causes a lot of lock contentions when multiple
threads are using tmpfs simultaneously, which makes
system with large number of cpus scale poorly.
Almost 75% of cpu time was spent contending on
stat_lock when we ran Aim7 fserver load with 128 threads
on a 4 socket, 32 cores NHM-EX system.

The first patch in the series implements the quick token jar. 
The second patch update the shmem code of tmpfs to use this
library to improve tmpfs performance.

Regards,
Tim Chen 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/2] tmpfs: Improve tmpfs scalability
  2010-05-18 23:34 [PATCH 0/2] tmpfs: Improve tmpfs scalability tim
@ 2010-05-19  9:27 ` Andi Kleen
  2010-05-21  1:55 ` Hugh Dickins
  1 sibling, 0 replies; 4+ messages in thread
From: Andi Kleen @ 2010-05-19  9:27 UTC (permalink / raw
  To: tim; +Cc: linux-kernel

, tim wrote:
>
> We created a token jar library implementing
> per cpu cache of tokens to avoid lock contentions whenever
> we retrieve or return a token to a token jar.  Using this library
> with tmpfs, we find Aim7 fserver throughput improved 270%
> on a 4 socket, 32 cores NHM-EX system.
>
> In current implementation of tmpfs, whenever we
> get a new page, stat_lock in shmem_sb_info needs to be acquired.
> This causes a lot of lock contentions when multiple
> threads are using tmpfs simultaneously, which makes
> system with large number of cpus scale poorly.
> Almost 75% of cpu time was spent contending on
> stat_lock when we ran Aim7 fserver load with 128 threads
> on a 4 socket, 32 cores NHM-EX system.
>
> The first patch in the series implements the quick token jar.
> The second patch update the shmem code of tmpfs to use this
> library to improve tmpfs performance.

I reviewed both patches and they look good to me.
Especially the token jar library should be useful in other places
too.

Reviewed-by: Andi Kleen <ak@linux.intel.com>

-Andi

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/2] tmpfs: Improve tmpfs scalability
  2010-05-18 23:34 [PATCH 0/2] tmpfs: Improve tmpfs scalability tim
  2010-05-19  9:27 ` Andi Kleen
@ 2010-05-21  1:55 ` Hugh Dickins
  2010-05-21 16:07   ` Tim Chen
  1 sibling, 1 reply; 4+ messages in thread
From: Hugh Dickins @ 2010-05-21  1:55 UTC (permalink / raw
  To: tim; +Cc: linux-kernel, Andi Kleen, Andrew Morton

On Tue, 18 May 2010, tim wrote:
> 
> We created a token jar library implementing
> per cpu cache of tokens to avoid lock contentions whenever
> we retrieve or return a token to a token jar.  Using this library
> with tmpfs, we find Aim7 fserver throughput improved 270% 
> on a 4 socket, 32 cores NHM-EX system.
> 
> In current implementation of tmpfs, whenever we
> get a new page, stat_lock in shmem_sb_info needs to be acquired.  
> This causes a lot of lock contentions when multiple
> threads are using tmpfs simultaneously, which makes
> system with large number of cpus scale poorly.
> Almost 75% of cpu time was spent contending on
> stat_lock when we ran Aim7 fserver load with 128 threads
> on a 4 socket, 32 cores NHM-EX system.
> 
> The first patch in the series implements the quick token jar. 
> The second patch update the shmem code of tmpfs to use this
> library to improve tmpfs performance.

Interesting, thank you - I'll take a look, but not this week.

I do hope you're using Aim7 just as an example: you know that mounting
tmpfs with nr_blocks=0,nr_inodes=0 skips those shmem_sb_info updates
altogether?  Mounting in such a way should be fine for getting better
numbers out of Aim7; but yes, there are reallife uses for tmpfs which
are safer with the nr_blocks,nr_inodes limits.

Hugh

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/2] tmpfs: Improve tmpfs scalability
  2010-05-21  1:55 ` Hugh Dickins
@ 2010-05-21 16:07   ` Tim Chen
  0 siblings, 0 replies; 4+ messages in thread
From: Tim Chen @ 2010-05-21 16:07 UTC (permalink / raw
  To: Hugh Dickins; +Cc: linux-kernel, Andi Kleen, Andrew Morton

On Thu, 2010-05-20 at 18:55 -0700, Hugh Dickins wrote:

> 
> Interesting, thank you - I'll take a look, but not this week.
> 
> I do hope you're using Aim7 just as an example: you know that mounting
> tmpfs with nr_blocks=0,nr_inodes=0 skips those shmem_sb_info updates
> altogether?  Mounting in such a way should be fine for getting better
> numbers out of Aim7; but yes, there are reallife uses for tmpfs which
> are safer with the nr_blocks,nr_inodes limits.
> 

Yes, Aim7 was provided as an example to illustrate the locking
bottleneck when blocks limit are imposed.

Tim



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-05-21 16:09 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-18 23:34 [PATCH 0/2] tmpfs: Improve tmpfs scalability tim
2010-05-19  9:27 ` Andi Kleen
2010-05-21  1:55 ` Hugh Dickins
2010-05-21 16:07   ` Tim Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).