Linus Torvalds writes: > On Wed, 6 Jan 1999, Steve Bergman wrote: > > > > Here are my latest numbers. This is timing a complete kernel compile (make > > clean;make depend;make;make modules;make modules_install) in 16MB memory with > > netscape, kde, and various daemons running. I unknowningly had two more daemons > > running in the background this time than last so the numbers can't be compared > > directly with my last test (Which I think I only sent to Andrea). But all of > > these numbers are consistent with *each other*. > > > > > > kernel Time Maj pf Min pf Swaps > > ---------- ----- ------ ------ ----- > > 2.2.0-pre5 18:19 522333 493803 27984 > > arcavm10 19:57 556299 494163 12035 > > arcavm9 19:55 553783 494444 12077 > > arcavm7 18:39 538520 493287 11526 > > Don't look too closely at the "swaps" number - I think pre-5 just changed > accounting a bit. A lot of the "swaps" are really just dropping a virtual > mapping (that is later picked up again from the page cache or the swap > cache). > > Basically, pre-5 uses the page cache and the swap cache more actively as a > "victim cache", and that inflates the "swaps" number simply due to the > accounting issues. > > I guess I shouldn't count the simple "drop_pte" operation as a swap at > all, because it doesn't involve any IO. > 2.2.0-pre5 works very good, indeed, but it still has some not sufficiently explored nuisances: 1) Swap performance in pre-5 is much worse compared to pre-4 in *certain* circumstances. I'm using quite stupid and unintelligent program to check for raw swap speed (attached below). With 64 MB of RAM I usually run it as 'hogmem 100 3' and watch for result which is recently around 6 MB/sec. But when I lately decided to start two instances of it like "hogmem 50 3 & hogmem 50 3 &" in pre-4 I got 2 x 2.5 MB/sec and in pre-5 it is only 2 x 1 MB/sec and disk is making very weird and frightening sounds. My conclusion is that now (pre-5) system behaves much poorer when we have more than one thrashing task. *Please*, check this, it is a quite serious problem. 2) In pre-5, under heavy load, free memory is hovering around freepages.min instead of being somewhere between freepages.low & freepages.max. This could make trouble for bursts of atomic allocations (networking!). 3) Nitpick #1: /proc/swapstats exist but is only filled with zeros. Probably it should go away. I believe Stephen added it recently, but only part of his patch got actually applied. 4) Nitpick #2": "Swap cache:" line in report of Alt-SysRq-M is not useful as it is laid now. People have repeatedly sent patches (Rik, Andrea...) to fix this but it is still not fixed, as of pre-5. 5) There is lots of #if 0 constructs in MM code, and also lots of structures are not anymore used but still take precious memory in compiled kernel and uncover itself under /proc (/proc/sys/vm/swapctl for instance). Do you want a patch to remove this cruft? 6) Finally one suggestion of mine. In swapfile.c there is comment: * We try to cluster swap pages by allocating them * sequentially in swap. Once we've allocated * SWAP_CLUSTER_MAX pages this way, however, we resort to * first-free allocation, starting a new cluster. This * prevents us from scattering swap pages all over the entire * swap partition, so that we reduce overall disk seek times This is good, but clustering of only 32 (SWAP_CLUSTER_MAX) * 4KB = 128KB is too small for today's disk and swap sizes. I tried to enlarge this value to something like 2 MB and got much much better results. This is very important now that we have swapin readahead to keep pages as adjacent as possible to each other so hit rate is big. It is trivial (one liner) and completely safe to make this constant much bigger, so I'm not even attaching a patch. 512 works very well and swapping is much faster than with default valuein place. Maybe this should even be sysctl controllable. If you agree with the last idea, I'll send you a patch, just confirm. I promised memory hogger: