Re: update re: fork() failures in 2.1.101

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* Re: update re: fork() failures in 2.1.101
       [not found] <19980611173940.51846@adore.lightlink.com>
@ 1998-06-12  4:36 ` Rik van Riel
  1998-06-12 22:58   ` Stephen C. Tweedie
  0 siblings, 1 reply; 8+ messages in thread
From: Rik van Riel @ 1998-06-12  4:36 UTC (permalink / raw
  To: Paul Kimoto; +Cc: Linux MM

[Paul get's "cannot fork" errors after 60 or more hours of
 uptime. This suggests fragmentation problems.]

On Thu, 11 Jun 1998, Paul Kimoto wrote:

> > Hmm, the 'cannot fork' issue only starting after some
> > days of uptime... This suggests fragmentation. Is your
> > box very heavily loaded, or just lightly (VM-wise)?
> 
> Light, I think; I have 48MB of RAM and usually end up with 8--16MB in swap.
> In normal operation I don't have to wait much for paging except for larger
> programs (netscape, xemacs, or big compilations).

Ahh, I think I see it now. The fragmentation on your system
persists because of the swap cache. The swap cache 'caches'
swap pages and kinda makes sure they are reloaded to the
same physical address.

Stephen, Ben: should we disable the swap cache when 
fragmentation is high?

Rik.
+-------------------------------------------------------------------+
| Linux memory management tour guide.        H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader.      http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: update re: fork() failures in 2.1.101
  1998-06-12  4:36 ` update re: fork() failures in 2.1.101 Rik van Riel
@ 1998-06-12 22:58   ` Stephen C. Tweedie
  0 siblings, 0 replies; 8+ messages in thread
From: Stephen C. Tweedie @ 1998-06-12 22:58 UTC (permalink / raw
  To: Rik van Riel; +Cc: Paul Kimoto, Linux MM

Hi,

On Fri, 12 Jun 1998 06:36:53 +0200 (MET DST), Rik van Riel
<H.H.vanRiel@phys.uu.nl> said:

> [Paul get's "cannot fork" errors after 60 or more hours of
>  uptime. This suggests fragmentation problems.]

Kernel version?

> Ahh, I think I see it now. The fragmentation on your system persists
> because of the swap cache. The swap cache 'caches' swap pages and
> kinda makes sure they are reloaded to the same physical address.

No.  As it stands right now, the "caching" component of the swap cache
is an *on disk* cache of resident pages.  Once the pages are swapped
out they are paged back in anywhere appropriate.  That part of the
fragmentation does not persist.

The real problem is not swapper, I suspect, but the various consumers of
slab cache (especially dcache).  The slab allocator has some really
nasty properties; just one single in-use object will pin an entire slab
(up to 32k) into memory.  If the slabs become small, then it will be 4k
pages which get so pinned, and at that point we cannot allocate any
stack pages.  There are a number of ways we may tackle this in 2.1, but
disabling the swap cache won't help at all.

--Stephen

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: update re: fork() failures in 2.1.101
       [not found] <19980618235448.18503@adore.lightlink.com>
@ 1998-06-19  7:33 ` Rik van Riel
  1998-06-19 15:01   ` update re: fork() failures [in 2.1.103] Paul Kimoto
  1998-06-21 20:19   ` update re: fork() failures in 2.1.103 Paul Kimoto
  0 siblings, 2 replies; 8+ messages in thread
From: Rik van Riel @ 1998-06-19  7:33 UTC (permalink / raw
  To: Paul Kimoto; +Cc: Linux MM

[CC-ed to linux-mm, and it should stay that way...]

On Thu, 18 Jun 1998, Paul Kimoto wrote:

> For completeness, here is the fragmentation report for each:
> > Jun 18 01:24:48   ( 48*4kB 7*8kB 1*16kB 1*32kB 4*64kB 1*128kB = 680kB)
> > Jun 18 18:03:53   ( 1*4kB 28*8kB 39*16kB 2*32kB 1*64kB 1*128kB = 1108kB)

Damn, this looks near-perfect for normal system load...
I really don't understand what's wrong.

> If you have other suggestions for things to try, with the reduction in
> memory (from 48 MB) the problems seem to arise in about half the time.

I wonder what kind of software / networking app you are using,
and what memory usage those programs have...

Rik.
+-------------------------------------------------------------------+
| Linux memory management tour guide.        H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader.      http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: update re: fork() failures [in 2.1.103]
  1998-06-19  7:33 ` update re: fork() failures in 2.1.101 Rik van Riel
@ 1998-06-19 15:01   ` Paul Kimoto
  1998-06-19 16:59     ` Rik van Riel
  1998-06-21 20:19   ` update re: fork() failures in 2.1.103 Paul Kimoto
  1 sibling, 1 reply; 8+ messages in thread
From: Paul Kimoto @ 1998-06-19 15:01 UTC (permalink / raw
  To: Linux MM; +Cc: Rik van Riel

On Fri, Jun 19, 1998 at 09:33:54AM +0200, Rik van Riel wrote:
> I wonder what kind of software / networking app you are using,
> and what memory usage those programs have...

It's a mixed libc5/libc6 system.  
Here is a snapshot of the Top 20 in RSS:

%CPU %MEM  SIZE   RSS
 1.3 18.9 13552  5876 Xwrapper        XFree 3.3.2.2
 1.0 18.4 10612  5716 netscape        3.01
 0.0  5.6  4508  1740 kermitbeta      6.1.193 Beta.05
 1.2  5.1  4072  1584 rvplayer        5.0.0.35
 0.0  3.7  4372  1176 kermitbeta
 0.0  3.7  4372  1168 kermitbeta
 0.0  2.9  1824   908 named           8.1.2
 0.0  2.9   960   908 xntpd           3-5.91 (locked into memory)
 0.0  2.8  2584   876 xterm
 0.0  2.4  2420   748 xterm
 0.0  2.3  1448   716 zsh             3.1.4
 0.0  2.1  1380   676 zsh
 0.0  2.1  1380   676 zsh
 0.0  2.1  1404   668 perl            5.004_04
 0.0  1.9  1512   592 gnuplot_x11     3.5 (3.50.1.17)
 0.0  1.9  2164   592 xload
 0.0  1.6   932   520 pppd            2.3.5
95.7  1.6  9364   520 mprime          15.4.2 (internet Mersenne prime search)
 0.0  1.5  1756   496 gnuplot
 0.0  1.5   836   488 ps              1.2.4

	-Paul <kimoto@lightlink.com>
	 [please cc: relevant messages to me]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: update re: fork() failures [in 2.1.103]
  1998-06-19 15:01   ` update re: fork() failures [in 2.1.103] Paul Kimoto
@ 1998-06-19 16:59     ` Rik van Riel
  1998-06-19 20:14       ` Paul Kimoto
  0 siblings, 1 reply; 8+ messages in thread
From: Rik van Riel @ 1998-06-19 16:59 UTC (permalink / raw
  To: Paul Kimoto; +Cc: Linux MM

On Fri, 19 Jun 1998, Paul Kimoto wrote:

> On Fri, Jun 19, 1998 at 09:33:54AM +0200, Rik van Riel wrote:
> > I wonder what kind of software / networking app you are using,
> > and what memory usage those programs have...
> 
> It's a mixed libc5/libc6 system.  
> Here is a snapshot of the Top 20 in RSS:
> 
> %CPU %MEM  SIZE   RSS
>  1.3 18.9 13552  5876 Xwrapper        XFree 3.3.2.2
>  1.0 18.4 10612  5716 netscape        3.01

> 95.7  1.6  9364   520 mprime          15.4.2 (internet Mersenne prime search)

Shouldn't be much of a problem... But 'eh, does the
Mersenne program regularly do memory I/O?
It could be that it loads large chunks of memory and
frees small portions from the middle of it. The Linux
MM system could have a problem with that...

Of course we should be able to handle such stuff, but
with the current buddy allocator things might just get
a little bit tricky :(

The reason I picked this process, is that it's RSS is
only one 18th of it's total size, which is somewhat
weird for a 'normal' Unix process.

Rik.
+-------------------------------------------------------------------+
| Linux memory management tour guide.        H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader.      http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: update re: fork() failures [in 2.1.103]
  1998-06-19 16:59     ` Rik van Riel
@ 1998-06-19 20:14       ` Paul Kimoto
  1998-06-20  0:48         ` George Woltman
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Kimoto @ 1998-06-19 20:14 UTC (permalink / raw
  To: Rik van Riel; +Cc: Linux MM, woltman

On Fri, Jun 19, 1998 at 06:59:56PM +0200, Rik van Riel wrote:
>> %CPU %MEM  SIZE   RSS
>> 95.7  1.6  9364   520 mprime        15.4.2 (internet Mersenne prime search)

> Shouldn't be much of a problem... But 'eh, does the
> Mersenne program regularly do memory I/O?
> It could be that it loads large chunks of memory and
> frees small portions from the middle of it. The Linux
> MM system could have a problem with that...

> The reason I picked this process, is that it's RSS is
> only one 18th of it's total size, which is somewhat
> weird for a 'normal' Unix process.

I *think* that it allocates a huge amount of memory,
then uses only a small portion of it.

The above shows an inconsistency between "ps" and "top":
  according to "ps",      SIZE=9364, RSS=404;
  but according to "top", SIZE= 500, RSS=404, SWAP=96.

"grep '^Vm' /proc/<pid>/status" says
> VmSize:     9364 kB
> VmLck:         0 kB
> VmRSS:       464 kB
> VmData:     8400 kB
> VmStk:        12 kB
> VmExe:        72 kB
> VmLib:       580 kB

	-Paul <kimoto@lightlink.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: update re: fork() failures [in 2.1.103]
  1998-06-19 20:14       ` Paul Kimoto
@ 1998-06-20  0:48         ` George Woltman
  0 siblings, 0 replies; 8+ messages in thread
From: George Woltman @ 1998-06-20  0:48 UTC (permalink / raw
  To: Paul Kimoto, Rik van Riel; +Cc: Linux MM

At 04:14 PM 6/19/98 -0400, Paul Kimoto wrote:
>
>I *think* that it allocates a huge amount of memory,
>then uses only a small portion of it.

This is indeed the case.  I know it's a sloppy programming practice,
but it was the easiest way for me to interface with all my assembly
code that assumes the FFT data is at a fixed address.  Mprime actually
has 16MB of global variables.  Unless you are testing an exponent above
20,000,000 then you are only using a small fraction of the 16MB.

Best regards,
George

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: update re: fork() failures in 2.1.103
  1998-06-19  7:33 ` update re: fork() failures in 2.1.101 Rik van Riel
  1998-06-19 15:01   ` update re: fork() failures [in 2.1.103] Paul Kimoto
@ 1998-06-21 20:19   ` Paul Kimoto
  1 sibling, 0 replies; 8+ messages in thread
From: Paul Kimoto @ 1998-06-21 20:19 UTC (permalink / raw
  To: Linux MM

RECAP: In 2.1.99, 2.1.101, 2.1.103, and 2.1.104-pre1, my system has been
usable for only ~1 day with 32 MB of memory, or ~2.5 days with 48 MB.
Then my system has trouble forking, typically with EAGAIN.  The situation
can be alleviated temporarily by killing off a few processes, but the
errors always reappear soon thereafter.  I have sent in the results of
Shift-ScrollLock, which Rik thinks are not typical of excessive memory
fragmentation.

Now, I have scripts that run "ifconfig ppp0" hourly (to check whether PPP
is "UP").  Recently I joined the modern era by changing from net-tools
1.432 to 1.45.  The forking errors have gone away (at least for uptimes
twice the above).  When I changed these scripts to run "/sbin/ifconfig.old
ppp0" instead, they came back.

Running the old ifconfig (when the problem arises) would put "kmod: fork
failed, errno 11" messages in the logfiles.  The new ifconfig doesn't.
Running strace on "ifconfig ppp0" shows that the old version makes the
following system calls that the new one doesn't:

> socket(PF_??? (0x4), SOCK_DGRAM, , 0)   = -1 ENOSYS (Function not implemented)
> socket(PF_??? (0x4), SOCK_DGRAM, , 0)   = -1 ENOSYS (Function not implemented)
> socket(PF_??? (0x4), SOCK_DGRAM, , 0)   = -1 EINVAL (Invalid argument)
> socket(PF_??? (0x3), SOCK_DGRAM, , 0)   = -1 ENOSYS (Function not implemented)
> socket(PF_??? (0x3), SOCK_DGRAM, , 0)   = -1 ENOSYS (Function not implemented)
> socket(PF_??? (0x3), SOCK_DGRAM, , 0)   = -1 EINVAL (Invalid argument) 
> socket(PF_??? (0x5), SOCK_DGRAM, , 0)   = -1 ENOSYS (Function not implemented)
> socket(PF_??? (0x5), SOCK_DGRAM, , 0)   = -1 ENOSYS (Function not implemented)
> socket(PF_??? (0x5), SOCK_DGRAM, , 0)   = -1 EINVAL (Invalid argument)

(I am not sure whether these system calls have been taken out of the 
new ifconfig, or whether I merely configured net-tools to be ignorant
of appletalk, etc.)

Something about my old ifconfig must be triggering a bug (or hardware
error?) somewhere.  I am willing to take further suggestions for
experiments to try, if anyone is still interested.

	-Paul <kimoto@lightlink.com>
	 (please cc: relevant messages to me)

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~1998-06-21 20:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <19980618235448.18503@adore.lightlink.com>
1998-06-19  7:33 ` update re: fork() failures in 2.1.101 Rik van Riel
1998-06-19 15:01   ` update re: fork() failures [in 2.1.103] Paul Kimoto
1998-06-19 16:59     ` Rik van Riel
1998-06-19 20:14       ` Paul Kimoto
1998-06-20  0:48         ` George Woltman
1998-06-21 20:19   ` update re: fork() failures in 2.1.103 Paul Kimoto
     [not found] <19980611173940.51846@adore.lightlink.com>
1998-06-12  4:36 ` update re: fork() failures in 2.1.101 Rik van Riel
1998-06-12 22:58   ` Stephen C. Tweedie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.