* [PATCH 0/2] tmpfs: Improve tmpfs scalability
@ 2010-05-18 23:34 tim
2010-05-19 9:27 ` Andi Kleen
2010-05-21 1:55 ` Hugh Dickins
0 siblings, 2 replies; 4+ messages in thread
From: tim @ 2010-05-18 23:34 UTC (permalink / raw
To: linux-kernel; +Cc: Andi Kleen
We created a token jar library implementing
per cpu cache of tokens to avoid lock contentions whenever
we retrieve or return a token to a token jar. Using this library
with tmpfs, we find Aim7 fserver throughput improved 270%
on a 4 socket, 32 cores NHM-EX system.
In current implementation of tmpfs, whenever we
get a new page, stat_lock in shmem_sb_info needs to be acquired.
This causes a lot of lock contentions when multiple
threads are using tmpfs simultaneously, which makes
system with large number of cpus scale poorly.
Almost 75% of cpu time was spent contending on
stat_lock when we ran Aim7 fserver load with 128 threads
on a 4 socket, 32 cores NHM-EX system.
The first patch in the series implements the quick token jar.
The second patch update the shmem code of tmpfs to use this
library to improve tmpfs performance.
Regards,
Tim Chen
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 0/2] tmpfs: Improve tmpfs scalability
2010-05-18 23:34 [PATCH 0/2] tmpfs: Improve tmpfs scalability tim
@ 2010-05-19 9:27 ` Andi Kleen
2010-05-21 1:55 ` Hugh Dickins
1 sibling, 0 replies; 4+ messages in thread
From: Andi Kleen @ 2010-05-19 9:27 UTC (permalink / raw
To: tim; +Cc: linux-kernel
, tim wrote:
>
> We created a token jar library implementing
> per cpu cache of tokens to avoid lock contentions whenever
> we retrieve or return a token to a token jar. Using this library
> with tmpfs, we find Aim7 fserver throughput improved 270%
> on a 4 socket, 32 cores NHM-EX system.
>
> In current implementation of tmpfs, whenever we
> get a new page, stat_lock in shmem_sb_info needs to be acquired.
> This causes a lot of lock contentions when multiple
> threads are using tmpfs simultaneously, which makes
> system with large number of cpus scale poorly.
> Almost 75% of cpu time was spent contending on
> stat_lock when we ran Aim7 fserver load with 128 threads
> on a 4 socket, 32 cores NHM-EX system.
>
> The first patch in the series implements the quick token jar.
> The second patch update the shmem code of tmpfs to use this
> library to improve tmpfs performance.
I reviewed both patches and they look good to me.
Especially the token jar library should be useful in other places
too.
Reviewed-by: Andi Kleen <ak@linux.intel.com>
-Andi
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 0/2] tmpfs: Improve tmpfs scalability
2010-05-18 23:34 [PATCH 0/2] tmpfs: Improve tmpfs scalability tim
2010-05-19 9:27 ` Andi Kleen
@ 2010-05-21 1:55 ` Hugh Dickins
2010-05-21 16:07 ` Tim Chen
1 sibling, 1 reply; 4+ messages in thread
From: Hugh Dickins @ 2010-05-21 1:55 UTC (permalink / raw
To: tim; +Cc: linux-kernel, Andi Kleen, Andrew Morton
On Tue, 18 May 2010, tim wrote:
>
> We created a token jar library implementing
> per cpu cache of tokens to avoid lock contentions whenever
> we retrieve or return a token to a token jar. Using this library
> with tmpfs, we find Aim7 fserver throughput improved 270%
> on a 4 socket, 32 cores NHM-EX system.
>
> In current implementation of tmpfs, whenever we
> get a new page, stat_lock in shmem_sb_info needs to be acquired.
> This causes a lot of lock contentions when multiple
> threads are using tmpfs simultaneously, which makes
> system with large number of cpus scale poorly.
> Almost 75% of cpu time was spent contending on
> stat_lock when we ran Aim7 fserver load with 128 threads
> on a 4 socket, 32 cores NHM-EX system.
>
> The first patch in the series implements the quick token jar.
> The second patch update the shmem code of tmpfs to use this
> library to improve tmpfs performance.
Interesting, thank you - I'll take a look, but not this week.
I do hope you're using Aim7 just as an example: you know that mounting
tmpfs with nr_blocks=0,nr_inodes=0 skips those shmem_sb_info updates
altogether? Mounting in such a way should be fine for getting better
numbers out of Aim7; but yes, there are reallife uses for tmpfs which
are safer with the nr_blocks,nr_inodes limits.
Hugh
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 0/2] tmpfs: Improve tmpfs scalability
2010-05-21 1:55 ` Hugh Dickins
@ 2010-05-21 16:07 ` Tim Chen
0 siblings, 0 replies; 4+ messages in thread
From: Tim Chen @ 2010-05-21 16:07 UTC (permalink / raw
To: Hugh Dickins; +Cc: linux-kernel, Andi Kleen, Andrew Morton
On Thu, 2010-05-20 at 18:55 -0700, Hugh Dickins wrote:
>
> Interesting, thank you - I'll take a look, but not this week.
>
> I do hope you're using Aim7 just as an example: you know that mounting
> tmpfs with nr_blocks=0,nr_inodes=0 skips those shmem_sb_info updates
> altogether? Mounting in such a way should be fine for getting better
> numbers out of Aim7; but yes, there are reallife uses for tmpfs which
> are safer with the nr_blocks,nr_inodes limits.
>
Yes, Aim7 was provided as an example to illustrate the locking
bottleneck when blocks limit are imposed.
Tim
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-05-21 16:09 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-18 23:34 [PATCH 0/2] tmpfs: Improve tmpfs scalability tim
2010-05-19 9:27 ` Andi Kleen
2010-05-21 1:55 ` Hugh Dickins
2010-05-21 16:07 ` Tim Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).