Linus Torvalds <torvalds@transmeta.com> writes:

> On Wed, 6 Jan 1999, Steve Bergman wrote:
> > 
> > Here are my latest numbers.  This is timing a complete kernel compile  (make
> > clean;make depend;make;make modules;make modules_install)  in 16MB memory with
> > netscape, kde, and various daemons running.  I unknowningly had two more daemons
> > running in the background this time than last so the numbers can't be compared
> > directly with my last test (Which I think I only sent to Andrea).  But all of
> > these numbers are consistent with *each other*.
> > 
> > 
> > kernel		Time	Maj pf	Min pf  Swaps
> > ----------	-----	------	------	-----
> > 2.2.0-pre5		18:19	522333	493803	27984
> > arcavm10		19:57	556299	494163	12035
> > arcavm9		19:55	553783	494444	12077
> > arcavm7		18:39	538520	493287	11526
> 
> Don't look too closely at the "swaps" number - I think pre-5 just changed
> accounting a bit. A lot of the "swaps" are really just dropping a virtual
> mapping (that is later picked up again from the page cache or the swap
> cache). 
> 
> Basically, pre-5 uses the page cache and the swap cache more actively as a
> "victim cache", and that inflates the "swaps" number simply due to the
> accounting issues. 
> 
> I guess I shouldn't count the simple "drop_pte" operation as a swap at
> all, because it doesn't involve any IO.
> 

2.2.0-pre5 works very good, indeed, but it still has some not
sufficiently explored nuisances:

1) Swap performance in pre-5 is much worse compared to pre-4 in
*certain* circumstances. I'm using quite stupid and unintelligent
program to check for raw swap speed (attached below). With 64 MB of
RAM I usually run it as 'hogmem 100 3' and watch for result which is
recently around 6 MB/sec. But when I lately decided to start two
instances of it like "hogmem 50 3 & hogmem 50 3 &" in pre-4 I got 2 x
2.5 MB/sec and in pre-5 it is only 2 x 1 MB/sec and disk is making
very weird and frightening sounds. My conclusion is that now (pre-5)
system behaves much poorer when we have more than one thrashing
task. *Please*, check this, it is a quite serious problem.

2) In pre-5, under heavy load, free memory is hovering around
freepages.min instead of being somewhere between freepages.low &
freepages.max. This could make trouble for bursts of atomic
allocations (networking!).

3) Nitpick #1: /proc/swapstats exist but is only filled with
zeros. Probably it should go away. I believe Stephen added it
recently, but only part of his patch got actually applied.

4) Nitpick #2": "Swap cache:" line in report of Alt-SysRq-M is not
useful as it is laid now. People have repeatedly sent patches (Rik,
Andrea...) to fix this but it is still not fixed, as of pre-5.

5) There is lots of #if 0 constructs in MM code, and also lots of
structures are not anymore used but still take precious memory in
compiled kernel and uncover itself under /proc (/proc/sys/vm/swapctl
for instance). Do you want a patch to remove this cruft?

6) Finally one suggestion of mine. In swapfile.c there is comment:

         * We try to cluster swap pages by allocating them
         * sequentially in swap.  Once we've allocated
         * SWAP_CLUSTER_MAX pages this way, however, we resort to
         * first-free allocation, starting a new cluster.  This
         * prevents us from scattering swap pages all over the entire
         * swap partition, so that we reduce overall disk seek times

This is good, but clustering of only 32 (SWAP_CLUSTER_MAX) * 4KB =
128KB is too small for today's disk and swap sizes. I tried to enlarge
this value to something like 2 MB and got much much better results.
This is very important now that we have swapin readahead to keep pages
as adjacent as possible to each other so hit rate is big. It is
trivial (one liner) and completely safe to make this constant much
bigger, so I'm not even attaching a patch. 512 works very well and
swapping is much faster than with default valuein place. Maybe this
should even be sysctl controllable. If you agree with the last idea,
I'll send you a patch, just confirm.

I promised memory hogger: