Re: Page coloring HOWTO [ans]

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* Re: Page coloring HOWTO [ans]
@ 1999-01-31  0:04 Larry McVoy
  1999-01-31  7:25 ` David S. Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Larry McVoy @ 1999-01-31  0:04 UTC (permalink / raw
  To: linux-kernel

Richard Gooch <rgooch@atnf.csiro.au>:
: > : >     (a) make sure that each process maps the same virtual addresses to 
: > : >         different locations in the cache, if possible.
: > : 
: > : >     (b) make sure that a contiguous chunk of virtual address space in
: > : >         one process occupies a contiguous chunk of cache, if possible.
: 
: OK, I was reading points (a) and (b) as though they were, in effect,
: the required specificiations for an algorithm to yield the best
: pages. Are they just comments on how the particular algorithm you
: mentioned works?
: 
: I'd like to speparate this into two issues. Firstly, requirements on
: how to lay out physical pages to minimise cache line aliasing.

Huh?  Direct mapped caches are all virtually indexed and physically
tagged, if I remember correctly.  Regardless, the buckets into which the
pages are sorted are set up such that if you were to allocate one page
from each bucket, then all of the pages would land in different chunks
of the cache by definition.   That's why you hash on physical addresses
when sorting the pages.

When looking for a page, you hash on the process' virtual address (plus
pid offset) because you have to have something, and what else are you
going to use?  That's the only deterministic thing you have.

: For the former issue, I'd like to establish whether you are saying
: that points (a) and (b) provide better pages than the simple "add plus
: modulus" scheme of generating goal colours?

It's not a question of "better", it's a question of "correct".  And you
don't generate colors for the physical pages except when you put them
in the buckets.  

I think you are mixing up the virtual addresses and physical addresses in
your thinking.  Do you a copy of Hennessy and Patterson?  I'm sure they 
talk about this or provide a pointer to a paper on it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Page coloring HOWTO [ans]
  1999-01-31  0:04 Larry McVoy
@ 1999-01-31  7:25 ` David S. Miller
  0 siblings, 0 replies; 7+ messages in thread
From: David S. Miller @ 1999-01-31  7:25 UTC (permalink / raw
  To: lm; +Cc: linux-kernel

I'll just relay my experience when I played around with this, and the
distribution scheme I found worked best.

First a clarification:

   Page coloring, in the sense that we are talking about here,
   is %99 dealing with physically indexed secondary/third-level
   etc. caches.  Virtually indexed secondary/third-level caches
   are dinosaurs and they'll die before anyone cares if we cater to
   them (the two most recent I know of were HyperSparc and aparently
   some HP cpus did this).  (and next will be N-way set assosciative
   secondary/third-level physically indexed caches, here page coloring
   in any form will become close to irrelevant)

A point in terminology/implementation:

   As far as page allocation is concerned, our granularity is
   PAGE_SIZE.  However the caches we want to "color" index with some
   lower order bits as well (that is, the cache line size is certainly
   smaller than PAGE_SIZE).  For the purposes of implementation, act
   as if the low order indexing bits did not exist (this translates in
   the end to, you don't need to know what the cache line size is to
   implement, only the total size matters).

   Assume that each architecture has indicated the cache line size to
   us in asm/cache.h in the form of:

#define L2_CACHE_SIZE	(512 * 1024)

   for example.

   We end up using the following definition in our internal
   implementation to do our work:

#define PAGE_COLOR(X)	(((X) & (L2_CACHE_SIZE - 1)) >> PAGE_SHIFT)

The following is a distribution scheme which I found to work extremely
well in practice and testing:

    Add to task_struct a member "int cur_color;"

    Add to inode a member "int cur_color"

    When giving a new address space to a process (via exec() or some
    other means, but not during fork/clone for example) set
    tsk->cur_color to zero.

    When allocating a new inode structure in the vfs, set
    inode->cur_color to zero.

    Now track page cache, page table allocation, and anonymous page
    faulting in the following way:

       a) At each anonymous page write fault, allocate a free page
          with color current->cur_color, and then increment this.

       b) At each page table page allocation, do the same as in #a

       c) At each addition of a new page into the page cache, allocate
          this page using the vfs object's inode->cur_color, and then
          increment.

(while considering the above scheme, consider the effects it has on
 mmap'd shared libraries etc.)

The only thing left is to implement:

	unsigned long get_colored_page(int gfp_flags, int *color_ptr)

Doing it efficiently and with minimal code changes in the current page
allocator is left right now as an exercise to the reader.  I have some
ideas, and after some experimentation I'll try to describe my ideas
for this here.

But right now I will say that four important issues here are:

1) It has to cost close to nothing.

2) It has to be "obviously correct".

3) It should not try too hard, this is a heuristic after all,
   the first priority is to get some page to the caller quickly.

4) It should not contribute to memory fragmentation.

(note that satisfying #3 is probably the nicest way to satisfy #4)

Later,
David S. Miller
davem@redhat.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Page coloring HOWTO [ans]
@ 1999-01-31  8:47 Colin Plumb
  0 siblings, 0 replies; 7+ messages in thread
From: Colin Plumb @ 1999-01-31  8:47 UTC (permalink / raw
  To: lm; +Cc: linux-kernel

Larry McVoy wrote:

> Page allocation becomes hash on virtual address and take a page from
> the bucket.However, here's the trick that fans them out in the cache,
> you hash on virtual address plus pid (I don't remember the exact details
> but you'll get it immeditately when you implement it - you just process
> 0 to take page 0 from bucket 0, process 1 to take page 0 from bucket 1,
> and so on).

Um, this is difficult, given that the same virtual->physical mapping
may be present in multiple processes.  Which pid do I use?

I think that assigning a colour to the vm_area_struct would be better.
You *would* have essentially random relationships between the colour offsets
of the various vm_area_structs in a process (text, data, stack).

Erm, I just thought of COW issues.  which arise because the data
segment is mapped COW from the backing file.  Do the cow-copied pages
have different colour offsets then the pages htey're copied from?

I can see disadvnatages to both possible answers.
-- 
	-Colin

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Page coloring HOWTO [ans]
@ 1999-01-31 19:24 Larry McVoy
  1999-01-31 22:05 ` Steve VanDevender
  1999-02-01  7:20 ` David S. Miller
  0 siblings, 2 replies; 7+ messages in thread
From: Larry McVoy @ 1999-01-31 19:24 UTC (permalink / raw
  To: linux-kernel

davem@redhat.com (Hey, look where Dave is :-) says:

:    Page coloring, in the sense that we are talking about here,
:    is %99 dealing with physically indexed secondary/third-level
:    etc. caches.  Virtually indexed secondary/third-level caches
:    are dinosaurs 

Are you sure about that? We should come to agreement on terminology.
I thought that the HyperSparc was virtually indexed and virtually tagged,
with just about everything else being virtually indexed but physically
tagged.

: The following is a distribution scheme which I found to work extremely
: well in practice and testing:
: 
:     Add to task_struct a member "int cur_color;"
: 
:     Add to inode a member "int cur_color"
: 
:     When giving a new address space to a process (via exec() or some
:     other means, but not during fork/clone for example) set
:     tsk->cur_color to zero.
: 
:     When allocating a new inode structure in the vfs, set
:     inode->cur_color to zero.
: 
:     Now track page cache, page table allocation, and anonymous page
:     faulting in the following way:
: 
:        a) At each anonymous page write fault, allocate a free page
:           with color current->cur_color, and then increment this.
: 
:        b) At each page table page allocation, do the same as in #a
: 
:        c) At each addition of a new page into the page cache, allocate
:           this page using the vfs object's inode->cur_color, and then
:           increment.

This has some nice attributes in that it will work well if a process
chooses to touch memory at a stride of exactly cache size.  However,
it's a little harder to tune for if you are an application writer because
different inputs to the program will give you different page colors.

You could argue it either way but I'm curious as to why not just use the
(pageno + pid) % cachesize alg.  Did you try this and find that it gave
consistently worse results?  I'd find that sort of hard to believe but 
your oblique reference to mmaped shared libraries gave me a hint that you
might be onto something.  Can you explain the results please?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Page coloring HOWTO [ans]
  1999-01-31 19:24 Page coloring HOWTO [ans] Larry McVoy
@ 1999-01-31 22:05 ` Steve VanDevender
  1999-02-01  7:20 ` David S. Miller
  1 sibling, 0 replies; 7+ messages in thread
From: Steve VanDevender @ 1999-01-31 22:05 UTC (permalink / raw
  To: linux-kernel

Larry McVoy writes:
 > davem@redhat.com (Hey, look where Dave is :-) says:
 > 
 > :    Page coloring, in the sense that we are talking about here,
 > :    is %99 dealing with physically indexed secondary/third-level
 > :    etc. caches.  Virtually indexed secondary/third-level caches
 > :    are dinosaurs 
 > 
 > Are you sure about that? We should come to agreement on terminology.
 > I thought that the HyperSparc was virtually indexed and virtually tagged,
 > with just about everything else being virtually indexed but physically
 > tagged.

Most of the x86 L2 direct-mapped cache systems I've seen would
appear to be physically indexed and physically tagged, if I
understand your terminology correctly.  The processor presents
physical addresses to the L2 cache, and the cache uses some of
the high-order physical address bits to tag cache lines.

I suppose there could have been virtually-indexed caches in the
days when the MMU was almost always external to the CPU, but I'm
not aware of any recent architectures where the processor's
external address bus uses virtual addresses.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Page coloring HOWTO [ans]
  1999-01-31 19:24 Page coloring HOWTO [ans] Larry McVoy
  1999-01-31 22:05 ` Steve VanDevender
@ 1999-02-01  7:20 ` David S. Miller
  1 sibling, 0 replies; 7+ messages in thread
From: David S. Miller @ 1999-02-01  7:20 UTC (permalink / raw
  To: lm; +Cc: linux-kernel

   From: lm@bitmover.com (Larry McVoy)
   Date: 	Sun, 31 Jan 1999 11:24:39 -0800

   davem@redhat.com (Hey, look where Dave is :-) says:

   :    Page coloring, in the sense that we are talking about here,
   :    is %99 dealing with physically indexed secondary/third-level
   :    etc. caches.  Virtually indexed secondary/third-level caches
   :    are dinosaurs 

   Are you sure about that? We should come to agreement on terminology.
   I thought that the HyperSparc was virtually indexed and virtually tagged,
   with just about everything else being virtually indexed but physically
   tagged.

MBUS speaks physical address, L2 caches on sun4m MBUS systems speak to
MBUS, therefore cache is physically tagged and 2 + 2 == 4. :-)  The
same for the UPA bus on UltraSparc, busses and cache coherency schemes
work with a currency of physically addressable objects.

Anything made today that I know of is physically indexed and
physically tagged (PIPT for short).

   This has some nice attributes in that it will work well if a process
   chooses to touch memory at a stride of exactly cache size.  However,
   it's a little harder to tune for if you are an application writer because
   different inputs to the program will give you different page colors.

Can you be at least slightly more explicit?

There is another trick btw, change the plain increment into something
like:

	tsk->cur_color = (tsk->cur_color + SOME_NICE_PRIME) % NUM_COLORS

This is what the final implementation I was playing with used.
With the proper prime you can obtain a perfect cycle for any
reasonable NUM_COLORS value.

   You could argue it either way but I'm curious as to why not just use the
   (pageno + pid) % cachesize alg.  Did you try this and find that it gave
   consistently worse results?

No I did not try, I'm eager to try it out once I work on some sample
implementations again.  At the time I was so happy with the results of
the algorithm I described that I didn't bother looking further.

   I'd find that sort of hard to believe but your oblique reference to
   mmaped shared libraries gave me a hint that you might be onto
   something.  Can you explain the results please?

The results were (on 512K L2 UltraSparcs):

1) Kernel builds were faster by close to 40 seconds.

2) lat_ctx gave perfectly smooth graphs for all process sizes and
   numbers of processes up until the total size walked of the L2 cache
   size boundary (512K in this case).  (in fact Larry, my graphs
   looked cleaner than the HP ones you always dribbled over, check
   your email archives because I believe I sent these graphs to you
   while I was still at Rutgers)

3) Applications which were just data set number crunchers, which I
   knew gave wildly inconsistant results for each run before, gave
   completely reproducable results with the coloring.  Also I could
   run parallel copies of these programs up to where they would have a
   total resident set size of the L2, and get consistant results as
   well.

The technique of coloring per-inode is very important.  Just consider
a simple example (in your head, the point is to visualize this).
Picture the first few programs which run on the system and begin
paging in parts of shared libc.  As each subsequent program uses some
new part of shared libc, it will color it relative to the color state
at the time of the page in.  Therefore, libc itself is colored for
everyone, it is much like a shared context.

Local anonymous pages for a process and page table pages are like a
non-shared local context.

One area which could be tuned is to discover whether it makes sense to
reset the per-process color counter to zero even at fork.  And
actually by my original description (do this at creation of each new
address space) this can be implied.  I did not do this in the
implementation I played with.

Another way to obtain a similar effect is to use half the number of
colors as you really have in the L2 cache, it's like creating a pseudo
2-way set assosciative L2 cache.

In fact, in my experience, it seemed to make no sense to color past
a L2 cache size of 512K.

BTW, I am rather certain that the "color" terminology stems from the
fact that the problem being solved here in a mathematical sense
mirrors graph coloring (fixed number of colors, goal is to prevent
neighbouring nodes with the same color, it cannot be fully solved with
bounded complexity in all cases, etc.)  Much register allocation
technology in compilers use graph coloring to model the problem as
well.

Later,
David S. Miller
davem@redhat.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Page coloring HOWTO [ans]
       [not found] <"4S1CG3.0.Tz5.-gvis"@tequila.cs.yale.edu>
@ 1999-02-01 23:11 ` Stefan Monnier
  0 siblings, 0 replies; 7+ messages in thread
From: Stefan Monnier @ 1999-02-01 23:11 UTC (permalink / raw
  To: linux-kernel

>>>>> "Richard" == Richard Gooch <rgooch@atnf.csiro.au> writes:
> OK, I was reading points (a) and (b) as though they were, in effect,
> the required specificiations for an algorithm to yield the best
> pages. Are they just comments on how the particular algorithm you
> mentioned works?

I believe the requirement is really something like "make the behavior more
predictable" (not just deterministic).  To reach this goal, you have to somehow
make sure that non-colliding virtual addresses turn into non-colliding physical
addresses and vice-versa (this way the compiler gets a fair chance to
statically figure out how to lay things out to avoid collisions).

Of course, the other possible requirement is just to make things more
deterministic.  It looks like Dave's alg is aiming for determinism rather than
predictability.

	Stefan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~1999-02-01 23:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
1999-01-31 19:24 Page coloring HOWTO [ans] Larry McVoy
1999-01-31 22:05 ` Steve VanDevender
1999-02-01  7:20 ` David S. Miller
     [not found] <"4S1CG3.0.Tz5.-gvis"@tequila.cs.yale.edu>
1999-02-01 23:11 ` Stefan Monnier
  -- strict thread matches above, loose matches on Subject: below --
1999-01-31  8:47 Colin Plumb
1999-01-31  0:04 Larry McVoy
1999-01-31  7:25 ` David S. Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.