Re: 2.1.130 mem usage.

Linux-mm Archive mirror
 help / color / mirror / Atom feed

* Re: 2.1.130 mem usage.
       [not found] <199812021749.RAA04575@dax.scot.redhat.com>
@ 1998-12-11  0:38 ` Andrea Arcangeli
  1998-12-11 14:05   ` Stephen C. Tweedie
  0 siblings, 1 reply; 4+ messages in thread
From: Andrea Arcangeli @ 1998-12-11  0:38 UTC (permalink / raw
  To: Stephen C. Tweedie; +Cc: linux-kernel, linux-mm, Rik van Riel, Linus Torvalds

On Wed, 2 Dec 1998, Stephen C. Tweedie wrote:

>>> +		/* 
>>> +		 * If the page we looked at was recyclable but we didn't
>>> +		 * reclaim it (presumably due to PG_referenced), don't
>>> +		 * count it as scanned.  This way, the more referenced
>>> +		 * page cache pages we encounter, the more rapidly we
>>> +		 * will age them. 
>>> +		 */
>>> +		if (atomic_read(&page->count) != 1 ||
>>> +		    (!page->inode && !page->buffers))
>>> count_min--;
>
>> I don' t think count_min should count the number of tries on pages we have
>> no chance to free. It should be the opposite according to me.
>
>No, the objective is not to swap unnecessarily, but still to start
>swapping if there is too much pressure on the cache.

My idea is that your patch works well due subtle reason. The effect of the
patch is that we try on a few freeable pages so we remove only a few
refernce bits and so we don' t throw away aging (just the opposite you
wrote in the comment :). The reason it works is that there are many more
not freeable pages than orphaned not-used ones. 

shrink_mmap 30628, 0
shrink_mmap 30705, 0
shrink_mmap 30705, 0
shrink_mmap 30705, 0
shrink_mmap 30644, 0
shrink_mmap 30705, 0
shrink_mmap 30705, 0
shrink_mmap 30705, 0
shrink_mmap 30705, 0
shrink_mmap 30705, 0
shrink_mmap 30705, 0

The two numbers are the count_max and count_min a bit before returning
from shrink_mmap() with your patch applyed (with the mm subsystem stressed
a lot from my leak proggy). Basically your patch cause shrink_mmap() to
play only on a very very little portion of memory every time. This give
you a way to reference the page again and to reset the referenced flag on
it again and avoing the kernel to really drop the page and having to do IO
to pagein again then... 

So basically it' s the same of setting count_min to 100 200 (instead of
10000/20000) pages and decrease count_min when we don' t decrease it with
your patch.

That' s the only reason that you can switch from two virtual desktop
without IO. The old shrink_mmap was used to throw out also our minimal
cached working set. With the patch applyed instead we fail very more
easily in shrink_mmap() and our working set is preserved (cool!). 
Basically without the patch with all older kernels do_try_to_free_pages
exit from state ==0 (because shrink_mmap failed) only when we are then
just forced to do IO to regain pages from disk.

There are still two mm cycles:

top:
	swapout == cache++ == state 1
	swapout == cache++ == state 1
	swapout == cache++ == state 1
	swapout == cache++ == state 1
	swapout == cache++ == state 1
	swapout == cache++ == state 1
	swapout == cache++ == state 1
	last time I checked swapout was not able to fail but since we are \
	over pg_borrow, state is now been set to 0 by me
	shrink_mmap() == cache-- == state 0
	shrink_mmap() == cache-- == state 0
	shrink_mmap() == cache-- == state 0
	shrink_mmap() == cache-- == state 0
	shrink_mmap() == cache-- == state 0
	shrink_mmap() == cache-- == state 0
	here with the old shrink_mmap pressure we was used to lose our working\
	set and so everything was bad... with your patch the working set\
	is preserved because you have the time to reference the pages
	shrink_mmap() failed so state == 1
	goto top

but as you can see at the end of the mmap cycle with your patch the cached
working set is preserved. I think the natural way to do that is to decrease
the pressure but decreasing very fast count_min has the same effect.

Pratically we can also drop count_max since it never happens (at least
here) that we stop because it' s 0. 

I am very tired :( so now my mind refuse to think if it would be better to
set count_min to something like (limit >> 2) >> (priority >> 1) and
reverse the check.

For the s/free_page_and_swap_cache/free_page/ I agree with it completly. I
only want to be sure that other mm parts are well balanced with the
change.

I guess that joining the filemap patch + the s/free.../free../ patch, we
cause do_try_to_free_pages to switch more easly from one state to the next
and the system is probably more balanced than 2.1.130 that way. 

It would also be nice to not have two separate mm cycles (one that grow the
cache until borrow percentage and the other one that shrink and that reach
very near the limit of the working set). We should have always the same level
of cache in the system if the mm stress is constant. This could be easily done
by a state++ inside do_try_to_free_pages() after some (how many??) susccesfully
returns.
We should also take care of not decrease i (priority) if we switched due
a balancing factor (and not because we failed). I' ll try that in my next very
little spare time...

Comments? (Today I am really very tired so my mind can fail right now..)

Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.1.130 mem usage.
  1998-12-11  0:38 ` 2.1.130 mem usage Andrea Arcangeli
@ 1998-12-11 14:05   ` Stephen C. Tweedie
  1998-12-11 18:08     ` Andrea Arcangeli
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen C. Tweedie @ 1998-12-11 14:05 UTC (permalink / raw
  To: Andrea Arcangeli
  Cc: Stephen C. Tweedie, linux-kernel, linux-mm, Rik van Riel,
	Linus Torvalds

Hi,

On Fri, 11 Dec 1998 01:38:47 +0100 (CET), Andrea Arcangeli
<andrea@e-mind.com> said:

>>>> +		if (atomic_read(&page->count) != 1 ||
>>>> +		    (!page->inode && !page->buffers))
>>>> count_min--;

> My idea is that your patch works well due subtle reason. The effect of the
> patch is that we try on a few freeable pages so we remove only a few
> refernce bits and so we don' t throw away aging (just the opposite you
> wrote in the comment :). The reason it works is that there are many more
> not freeable pages than orphaned not-used ones. 

> So basically it' s the same of setting count_min to 100 200 (instead of
> 10000/20000) pages and decrease count_min when we don' t decrease it with
> your patch.

No, no, not at all.  The whole point is that this patch does indeed
behave as you describe if the cache is small or moderately sized, but
if you have something like a "cat /usr/bin/* > /dev/null" going on,
the large fraction of cached but referenced pages will cause the new
code to become more aggressive in its scanning (because the pages
which contribute to the loop exit condition become more dilute).  This
is exactly what you want for self-balancing behaviour.

> For the s/free_page_and_swap_cache/free_page/ I agree with it completly. I
> only want to be sure that other mm parts are well balanced with the
> change.

Please try 2.1.131-ac8, then, as it not only includes the patches
we're talking about here, but it also adds Rik's swap readahead stuff
extended to do aligned block readahead for both swap and normal mmap
paging. 

> It would also be nice to not have two separate mm cycles (one that
> grow the cache until borrow percentage and the other one that shrink
> and that reach very near the limit of the working set). We should
> have always the same level of cache in the system if the mm stress
> is constant. This could be easily done by a state++ inside
> do_try_to_free_pages() after some (how many??) susccesfully returns.

I'm seeing a pretty stable cache behaviour here, on everything from
4MB to 64MB systems.

--Stephen
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.1.130 mem usage.
  1998-12-11 14:05   ` Stephen C. Tweedie
@ 1998-12-11 18:08     ` Andrea Arcangeli
  1998-12-12 15:14       ` Andrea Arcangeli
  0 siblings, 1 reply; 4+ messages in thread
From: Andrea Arcangeli @ 1998-12-11 18:08 UTC (permalink / raw
  To: Stephen C. Tweedie; +Cc: linux-kernel, linux-mm, Rik van Riel, Linus Torvalds

On Fri, 11 Dec 1998, Stephen C. Tweedie wrote:

>the large fraction of cached but referenced pages will cause the new
>code to become more aggressive in its scanning (because the pages
>which contribute to the loop exit condition become more dilute).  This
>is exactly what you want for self-balancing behaviour.

Yes is that what I want. With the past email I only wanted to pointed out
that if I remeber well you published the patch as fix to the excessive
swapout (look the report of people that was pointing out the swpd field of
`vmstat 1`). Your patch instead will cause still more swapout, note I am
not talking about I/O. This is the reason I didn' t agreed with your patch
at first because I thought you would get the opposite effect (and I
couldn' t understand why it could improve things).  The reason is that
your patch will cause less IO (cool)  since the cache working set will be
preserved fine. I agree with the patch as far I agree with decreasing the
pressure on shrink_mmap().  Also your comment is not exaustive since you
say that the new check will cause the cache to be aged faster while
instead it reduces _radically_ the pressure of shrink_mmap() and so the
cache will be aged slower than with the previous code. The improvement is
not because we age faster but because we age slower and we don' t throw
away the cache of our working set (and so reducing very a not needed sloww
IO).

As always correct me if I am wrong or I am misunderstanding something. 

>> For the s/free_page_and_swap_cache/free_page/ I agree with it completly. I
>> only want to be sure that other mm parts are well balanced with the
>> change.
>
>Please try 2.1.131-ac8, then, as it not only includes the patches

I am just running with the ac6 mm (except for kswapd but that will make no
difference for what we are discussing here since do_try_to_free_pages() is
the same). ac6 seems good to me (for the reason above) and now it make
sense to me (too ;).

>we're talking about here, but it also adds Rik's swap readahead stuff
>extended to do aligned block readahead for both swap and normal mmap
>paging. 

Downloading ac8 from here is a pain (I was used to get patches from
linux-kernel-patches). A guy sent me by email ac7 but since I want sync
with ac8 I' ll wait a bit for ac8... 

>> It would also be nice to not have two separate mm cycles (one that
>> grow the cache until borrow percentage and the other one that shrink
>> and that reach very near the limit of the working set). We should
>> have always the same level of cache in the system if the mm stress
>> is constant. This could be easily done by a state++ inside
>> do_try_to_free_pages() after some (how many??) susccesfully returns.
>
>I'm seeing a pretty stable cache behaviour here, on everything from
>4MB to 64MB systems.

It works fine but it' s not stable at all. The cache here goes from
40Mbyte to 10Mbyte in cycle (the only local changes I have here are on
kswapd implementation; do_try_to_free_pages() and all other function that
do_try_to_free_pages() uses are untouched). The good thing is that now
when the cache reaches the low bound the working set is preserved (this is
achieved by decreasing (not increasing as it seem to me reading the
comment some days ago) the pressure of shrink_mmap()). 

Now I' ll try to remove my state = 0 to see what will happens...  My state
= 0 is the reason of the mm cycle I am seeing here, but is also the reason
for which the mm subsystem doesn' t swapout too much. I' ll experiment
now...

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.1.130 mem usage.
  1998-12-11 18:08     ` Andrea Arcangeli
@ 1998-12-12 15:14       ` Andrea Arcangeli
  0 siblings, 0 replies; 4+ messages in thread
From: Andrea Arcangeli @ 1998-12-12 15:14 UTC (permalink / raw
  To: Stephen C. Tweedie; +Cc: linux-kernel, linux-mm, Rik van Riel, Linus Torvalds

On Fri, 11 Dec 1998, Andrea Arcangeli wrote:

>>> It would also be nice to not have two separate mm cycles (one that
>>> grow the cache until borrow percentage and the other one that shrink
>>> and that reach very near the limit of the working set). We should
>>> have always the same level of cache in the system if the mm stress
>>> is constant. This could be easily done by a state++ inside
>>> do_try_to_free_pages() after some (how many??) susccesfully returns.
>>
>>I'm seeing a pretty stable cache behaviour here, on everything from
>>4MB to 64MB systems.
>
>It works fine but it' s not stable at all. The cache here goes from

This patch should rebalance the swapping/mmap-shrinking (and seems to
works here, even if really my kswapd start when the buf/cache are over max
and stop when they are under borrow, I don' t remeber without look at the
code what the stock kswapd is doing):

Index: vmscan.c
===================================================================
RCS file: /var/cvs/linux/mm/vmscan.c,v
retrieving revision 1.1.1.1.2.16
diff -u -r1.1.1.1.2.16 vmscan.c
--- vmscan.c	1998/12/12 12:31:57	1.1.1.1.2.16
+++ linux/mm/vmscan.c	1998/12/12 14:27:55
@@ -439,7 +439,8 @@
 	kmem_cache_reap(gfp_mask);
 
 	if (buffer_over_borrow() || pgcache_over_borrow())
-		state = 0;
+		if (shrink_mmap(i, gfp_mask))
+			return 1;
 	if (atomic_read(&nr_async_pages) > pager_daemon.swap_cluster / 2)
 		shrink_mmap(i, gfp_mask);
 

The patch basically avoids the clobbering of state so the mm remains
always in state = `swapout' but the cache remains close to the borrow
percentage. I should have do that from time 0 instead of using state =
0...

Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~1998-12-12 15:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <199812021749.RAA04575@dax.scot.redhat.com>
1998-12-11  0:38 ` 2.1.130 mem usage Andrea Arcangeli
1998-12-11 14:05   ` Stephen C. Tweedie
1998-12-11 18:08     ` Andrea Arcangeli
1998-12-12 15:14       ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).