2.5.42-mm2 hangs system

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* 2.5.42-mm2 hangs system
@ 2002-10-13 16:04 Henrik Størner
  2002-10-13 21:03 ` William Lee Irwin III
       [not found] ` <3DA9CA28.155BA5CB@digeo.com>
  0 siblings, 2 replies; 21+ messages in thread
From: Henrik Størner @ 2002-10-13 16:04 UTC (permalink / raw
  To: linux-mm

I gave 2.5.42-mm2 a test run yesterday, and it hung the box solid
while doing a kernel compile. The compile stopped dead in the middle
of a file, and there was no response when trying to access another
console (no X running). Alt-sysrq worked, so it wasn't completely dead
- sync/umount/reboot worked.

Nothing in the logs - no oops or other kernel messages.

Rebooted and repeated the experiment with the same result,
so it appears to be reproducible.

Stock 2.5.42 has worked OK for a day now, including kernel
compiles - the system has performed flawlessly for a 
couple of years as my normal workstation.

PII processor, 384 MB RAM, SCSI disk (ncr53c8xx driver),
Intel eepro/100 network adapter. Kernel config at
http://www.hswn.dk/config-2.5.42-mm2

-- 
Henrik Storner <henrik@hswn.dk> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.42-mm2 hangs system
  2002-10-13 16:04 2.5.42-mm2 hangs system Henrik Størner
@ 2002-10-13 21:03 ` William Lee Irwin III
       [not found] ` <3DA9CA28.155BA5CB@digeo.com>
  1 sibling, 0 replies; 21+ messages in thread
From: William Lee Irwin III @ 2002-10-13 21:03 UTC (permalink / raw
  To: Henrik St?rner; +Cc: linux-mm

On Sun, Oct 13, 2002 at 06:04:51PM +0200, Henrik St?rner wrote:
> I gave 2.5.42-mm2 a test run yesterday, and it hung the box solid
> while doing a kernel compile. The compile stopped dead in the middle
> of a file, and there was no response when trying to access another
> console (no X running). Alt-sysrq worked, so it wasn't completely dead
> - sync/umount/reboot worked.
> Nothing in the logs - no oops or other kernel messages.
> Rebooted and repeated the experiment with the same result,
> so it appears to be reproducible.
> Stock 2.5.42 has worked OK for a day now, including kernel
> compiles - the system has performed flawlessly for a 
> couple of years as my normal workstation.
> PII processor, 384 MB RAM, SCSI disk (ncr53c8xx driver),
> Intel eepro/100 network adapter. Kernel config at
> http://www.hswn.dk/config-2.5.42-mm2

Please reproduce and pass on the output from sysrq-t.


Bill
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.42-mm2 hangs system
       [not found] ` <3DA9CA28.155BA5CB@digeo.com>
@ 2002-10-13 22:33   ` Henrik Størner
  2002-10-13 22:57     ` Andrew Morton
  2002-10-16 13:09     ` 2.5.42-mm2 hangs system Maneesh Soni
  0 siblings, 2 replies; 21+ messages in thread
From: Henrik Størner @ 2002-10-13 22:33 UTC (permalink / raw
  To: linux-mm

On Sun, Oct 13, 2002 at 12:31:52PM -0700, Andrew Morton wrote:
> Henrik Storner wrote:
> > 
> > I gave 2.5.42-mm2 a test run yesterday, and it hung the box solid
> > while doing a kernel compile. The compile stopped dead in the middle
> > of a file, and there was no response when trying to access another
> > console (no X running). Alt-sysrq worked, so it wasn't completely dead
> > - sync/umount/reboot worked.
> > 
> > Nothing in the logs - no oops or other kernel messages.
> > 
> > Rebooted and repeated the experiment with the same result,
> > so it appears to be reproducible.
> > 
> > Stock 2.5.42 has worked OK for a day now, including kernel
> > compiles - the system has performed flawlessly for a
> > couple of years as my normal workstation.
> > 
> > PII processor, 384 MB RAM, SCSI disk (ncr53c8xx driver),
> > Intel eepro/100 network adapter. Kernel config at
> > http://www.hswn.dk/config-2.5.42-mm2
> 
> Very odd.
> 
> If you have time, could you please enable "load all symbols"
> in the kernel hacking menu and capture a sysrq-T trace?
> Thanks.

Did so - built it again from a fresh kernel tree, just to be sure.
Compiler is gcc 3.2 from Red Hat 8, by the way.

Bug is still there. sysrq-T scrolls off the screen too fast for me to
read, but the last screenful has several processes like this (could
see sh, make, sh, gcc):

Call Trace:
  sys_wait4+0x209/0x4d0
  default_wake_function+0x0/0x40
  default_wake_function+0x0/0x40
  syscall_call+0x7/0xb  

The last two tasks:

cc1  R  d4d74080  20  2232  2231     2233 (NOTLB)
Call Trace:
   work_resched+0x5/0x16

as   R  d3c778c0  24  2233  2231     2232 (NOTLB)
Call Trace:
   pipe_wait+0x98/0xe0
   default_wake_function+0x0/0x40
   default_wake_function+0x0/0x40
   pipe_read+0xf9/0x240
   vfs_read+0xdc/0x150
   sys_mmap2+0x9f/0xe0
   sys_read+0x3e/0x60
   syscall_call+0x7/0xb


I captured the ALT+ScrollLock output also:

Pid 1739, comm: nfsd
EIP 0060:c0160250   CPU:0
EIP is at d_lookup+0x70/0x160
   Eflags: 00000297     Not tainted
Call Trace
   cached_lookup+0x1b/0x70
   lookup_hash+0x72/0xe0
   lookup_one_len+0x5f/0x70
   find_exported_dentry+0x61f/0x730
   reiserfs_delete_solid_item+0xfd/0x2b0
   reiserfs_delete_solid_item+0xfd/0x2b0
   check_journal_end+0x18a/0x2b0
   rcu_check_callbacks+0x59/0x90
   schedule_tick+0x348/0x350
   update_process_times+0x46/0x60
   reiserfs_decode_fh+0xc2/0x100
   nfsd_acceptable+0x0/0xe0
   fh_verify+0x38e/0x570
   nfsd_acceptable+0x0/0xe0
   nsfd_statfs+0x2f/0x70
   nfsd3_proc_fsstat+0x37/0xc0
   nfs3svc_decode_fhandle+0x38/0xb0
   nfsd_dispatch+0xce/0x230
   svc_process+0x3f6+0x5e0
   nfsd+0x13f/0x250
   nfsd+0x0/0x250
   kernel_thread_helper+0x5/0x18


If you need the full sysrq-t output, I'll have to setup a serial
console to capture it.
 
-- 
Henrik Storner <henrik@hswn.dk> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.42-mm2 hangs system
  2002-10-13 22:33   ` Henrik Størner
@ 2002-10-13 22:57     ` Andrew Morton
  2002-10-14 12:25       ` 2.5.42-mm2 on small systems Ed Tomlinson
  2002-10-16 13:09     ` 2.5.42-mm2 hangs system Maneesh Soni
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2002-10-13 22:57 UTC (permalink / raw
  To: Henrik Størner; +Cc: linux-mm

Henrik Storner wrote:
> 
> I captured the ALT+ScrollLock output also:
> 
> Pid 1739, comm: nfsd
> EIP 0060:c0160250   CPU:0
> EIP is at d_lookup+0x70/0x160
>    Eflags: 00000297     Not tainted
> Call Trace
>    cached_lookup+0x1b/0x70
>    lookup_hash+0x72/0xe0
>    lookup_one_len+0x5f/0x70
>    find_exported_dentry+0x61f/0x730
>    reiserfs_delete_solid_item+0xfd/0x2b0
>    reiserfs_delete_solid_item+0xfd/0x2b0
>    check_journal_end+0x18a/0x2b0
>    rcu_check_callbacks+0x59/0x90
>    schedule_tick+0x348/0x350
>    update_process_times+0x46/0x60
>    reiserfs_decode_fh+0xc2/0x100
>    nfsd_acceptable+0x0/0xe0
>    fh_verify+0x38e/0x570
>    nfsd_acceptable+0x0/0xe0
>    nsfd_statfs+0x2f/0x70
>    nfsd3_proc_fsstat+0x37/0xc0
>    nfs3svc_decode_fhandle+0x38/0xb0

OK.  This is possibly dentry hashtable corruption.  I saw one
instance of this in about 2.5.41-mm3, followed by two other
weird random memory corruptions.

So it could be that something in there is going for a memory
stomp.  Don't really know any more than that at this time.

I _was_ suspecting oprofile or the latest addition to the shared
pagetable code.  But you're not using either.

It would be interesting to enable all the memory debugging options
under the kernel hacking menu, see if that turns anything up.

I'll build a kernel with your config and beat on reiserfs for a bit,
see if I can make it happen.

Apart from that, one way to isolate it is to just keep backing off
the patches until it goes away.  Which is not a ton of fun.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* 2.5.42-mm2 on small systems
  2002-10-13 22:57     ` Andrew Morton
@ 2002-10-14 12:25       ` Ed Tomlinson
  2002-10-14 14:34         ` Martin J. Bligh
  2002-10-15  6:42         ` Andrew Morton
  0 siblings, 2 replies; 21+ messages in thread
From: Ed Tomlinson @ 2002-10-14 12:25 UTC (permalink / raw
  To: Andrew Morton, Bill Davidsen; +Cc: linux-mm

Hi,

I have an old 486 with 64m and 512M of disk that I use as a serial console.
It does not have enough space to be useful for much else.  So I decided to
test the low end and tried it with 2.5.42-mm2.  It boots and seems to work 
fine.  Then I tried the resp1 (http://pages.prodigy.net/davidsen/) benchmark.  
With 2.4.18 it works:

Memory size 61 MB
  Starting 1 CPU run with 61 MB RAM, minimum 5 data points at 20 sec intervals
 
.              .          .          .          .          .        .        .
                       _____________ delay ms. ____________                  
           Test        low       high    median     average     S.D.    ratio
         noload   2128.527   2138.035   2129.915   2131.269    0.003    1.000
     smallwrite   4178.129  27436.634   4318.342  11111.745    8.927    5.214
     largewrite   4157.574  78592.200   4222.064  16336.681   24.926    7.665
        cpuload   6109.576   8018.156   6230.810   6425.307    0.600    3.015
      spawnload   5508.218   6934.219   5556.992   5706.077    0.462    2.677
       8ctx-mem  10090.974  22222.700  12662.532  13511.634    3.433    6.340
       2ctx-mem   9330.010  21106.194  10745.474  11650.974    3.612    5.467

with 2.5.42-mm2 it does not finish.  The machine is sort of usable while its runing
and control C has no problem ending the program.  I waited 11 hours for the spawnload
test to complete - it was looking very good before this....

Memory size 61 MB
  Starting 1 CPU run with 61 MB RAM, minimum 5 data points at 20 sec intervals
 
.             .          .          .          .          .        .        .
                       _____________ delay ms. ____________                  
           Test        low       high    median     average     S.D.    ratio
         noload   2262.747   2269.895   2264.050   2264.796    0.002    1.000
     smallwrite   3797.901  12132.336   3875.934   5364.276    2.815    2.369
     largewrite   3857.445  35682.893   3875.064   8405.061   10.531    3.711
        cpuload   5385.148   7589.479   5514.157   5771.985    0.729    2.549

The box was not limited by IO (no swapping nor was there much bi/bo in
vmstat).  About 25% User and 75% system in cpu though.

Ed


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.42-mm2 on small systems
  2002-10-14 12:25       ` 2.5.42-mm2 on small systems Ed Tomlinson
@ 2002-10-14 14:34         ` Martin J. Bligh
  2002-10-14 21:24           ` Bill Davidsen
  2002-10-15  6:42         ` Andrew Morton
  1 sibling, 1 reply; 21+ messages in thread
From: Martin J. Bligh @ 2002-10-14 14:34 UTC (permalink / raw
  To: Ed Tomlinson, Andrew Morton, Bill Davidsen; +Cc: linux-mm

 
> I have an old 486 with 64m and 512M of disk that I use as a serial 
...
> with 2.5.42-mm2 it does not finish.  The machine is sort of usable 
> while its runing and control C has no problem ending the program.  
> I waited 11 hours for the spawnload test to complete - it was 

What does spawnload do (for those of us who don't have the inclination
to go source diving)?

M.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.42-mm2 on small systems
  2002-10-14 14:34         ` Martin J. Bligh
@ 2002-10-14 21:24           ` Bill Davidsen
  0 siblings, 0 replies; 21+ messages in thread
From: Bill Davidsen @ 2002-10-14 21:24 UTC (permalink / raw
  To: Martin J. Bligh; +Cc: Ed Tomlinson, Andrew Morton, linux-mm

On Mon, 14 Oct 2002, Martin J. Bligh wrote:

>  
> > I have an old 486 with 64m and 512M of disk that I use as a serial 
> ...
> > with 2.5.42-mm2 it does not finish.  The machine is sort of usable 
> > while its runing and control C has no problem ending the program.  
> > I waited 11 hours for the spawnload test to complete - it was 
> 
> What does spawnload do (for those of us who don't have the inclination
> to go source diving)?

In this case a half scree of source diving is the best answer, it forks a
process which fork/exec's a shell, which either runs the builtin pwd or
/bin/pwd depending on what shell you have set. In most cases that's bash,
and uses the builtin. Does a bunch of process creation and cleanup, and
can generate some impressive contet switching.

    while (RunMe) {
        if (pid = fork()) {
            (void)wait();
            NumFork++;
        } else {
            // Do a 2nd level fork/exec a few times
            system("pwd >/dev/null");
            exit(0);
        }

I will say that I ran 41-mm2 and 41-mm2v (Con Kolivas' patch) just fine, I
can't get 5.42 anything to even build, it's looking for NLS and the config
has no NLS, unless I have a bad patch. I'm going to scan the list for
patches later, but that's my current eperience.

The README (choose text, Postscript or HTML) has a description of what
each test does. Or what I think it does.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.42-mm2 on small systems
  2002-10-14 12:25       ` 2.5.42-mm2 on small systems Ed Tomlinson
  2002-10-14 14:34         ` Martin J. Bligh
@ 2002-10-15  6:42         ` Andrew Morton
  2002-10-16 20:55           ` Bill Davidsen
  1 sibling, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2002-10-15  6:42 UTC (permalink / raw
  To: Ed Tomlinson; +Cc: Bill Davidsen, linux-mm

Ed Tomlinson wrote:
> 
> ...
> 
> with 2.5.42-mm2 it does not finish.  The machine is sort of usable while its runing
> and control C has no problem ending the program.  I waited 11 hours for the spawnload
> test to complete - it was looking very good before this....
> 
> Memory size 61 MB
>   Starting 1 CPU run with 61 MB RAM, minimum 5 data points at 20 sec intervals
> 
> .             .          .          .          .          .        .        .
>                        _____________ delay ms. ____________
>            Test        low       high    median     average     S.D.    ratio
>          noload   2262.747   2269.895   2264.050   2264.796    0.002    1.000
>      smallwrite   3797.901  12132.336   3875.934   5364.276    2.815    2.369
>      largewrite   3857.445  35682.893   3875.064   8405.061   10.531    3.711
>         cpuload   5385.148   7589.479   5514.157   5771.985    0.729    2.549
> 
> The box was not limited by IO (no swapping nor was there much bi/bo in
> vmstat).  About 25% User and 75% system in cpu though.

hm.  Works for me.  The default setting are waaay too boring, so
I used ./resp -m2 -M5 -w5

           Test        low       high    median     average   median      avg
         noload    143.168    149.676    143.258    145.602    1.000    1.000
     smallwrite    144.319   4350.325    269.161   1428.881    1.879    9.814
     largewrite    230.759   1129.816    492.421    539.192    3.437    3.703
        cpuload    142.833    207.206    143.374    159.036    1.001    1.092
      spawnload    143.066    313.944    143.240    177.391    1.000    1.218
       8ctx-mem    159.396   5823.791    810.837   2020.066    5.660   13.874
       2ctx-mem    757.203   8192.148   1294.120   2538.975    9.033   17.438

Could be a scheduler thing?  Maybe a bug in the test?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.42-mm2 hangs system
  2002-10-13 22:33   ` Henrik Størner
  2002-10-13 22:57     ` Andrew Morton
@ 2002-10-16 13:09     ` Maneesh Soni
  2002-10-16 15:49       ` Henrik Størner
  1 sibling, 1 reply; 21+ messages in thread
From: Maneesh Soni @ 2002-10-16 13:09 UTC (permalink / raw
  To: Henrik Størner; +Cc: linux-mm, akpm, Dipankar Sarma

On Sun, Oct 13, 2002 at 10:34:40PM +0000, Henrik Storner wrote:
> On Sun, Oct 13, 2002 at 12:31:52PM -0700, Andrew Morton wrote:
> > Henrik Storner wrote:
> > > 
> > > I gave 2.5.42-mm2 a test run yesterday, and it hung the box solid
> > > while doing a kernel compile. The compile stopped dead in the middle
> > > of a file, and there was no response when trying to access another
> > > console (no X running). Alt-sysrq worked, so it wasn't completely dead
> > > - sync/umount/reboot worked.
> > > 
> > > Nothing in the logs - no oops or other kernel messages.
> > > 
> > > Rebooted and repeated the experiment with the same result,
> > > so it appears to be reproducible.
> > > 
> > > Stock 2.5.42 has worked OK for a day now, including kernel
> > > compiles - the system has performed flawlessly for a
> > > couple of years as my normal workstation.
> > > 
> > > PII processor, 384 MB RAM, SCSI disk (ncr53c8xx driver),
> > > Intel eepro/100 network adapter. Kernel config at
> > > http://www.hswn.dk/config-2.5.42-mm2
> > 
> > Very odd.
> > 
> > If you have time, could you please enable "load all symbols"
> > in the kernel hacking menu and capture a sysrq-T trace?
> > Thanks.
> 
> Did so - built it again from a fresh kernel tree, just to be sure.
> Compiler is gcc 3.2 from Red Hat 8, by the way.
> 
> Bug is still there. sysrq-T scrolls off the screen too fast for me to
> read, but the last screenful has several processes like this (could
> see sh, make, sh, gcc):
> 
> Call Trace:

Hello Henrik,

I tired recreating the hang, but it didnot occur. I could guess from the
call trace that you are using reiserfs and nfs but I not very clear how
are you recreating it. I created a resierfs partition and exported it. Then
tried to compile a kernel over it. I used the config file from the site
you mentioned.

It will be nice if you can list the exact recreation steps mentioning the
filesystems you are using.

As the hang looks like a loop in d_lookup can you  try
recreating it *without* dcache_rcu.patch. You can backout this patch

http://www.zipworld.com.au/~akpm/linux/patches/2.5/2.5.42/2.5.42-mm2/broken-out/dcache_rcu.patch


Thanks
Maneesh

-- 
Maneesh Soni
IBM Linux Technology Center, 
IBM India Software Lab, Bangalore.
Phone: +91-80-5044999 email: maneesh@in.ibm.com
http://lse.sourceforge.net/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.42-mm2 hangs system
  2002-10-16 13:09     ` 2.5.42-mm2 hangs system Maneesh Soni
@ 2002-10-16 15:49       ` Henrik Størner
  2002-10-16 18:59         ` Henrik Størner
  2002-10-17 14:38         ` Maneesh Soni
  0 siblings, 2 replies; 21+ messages in thread
From: Henrik Størner @ 2002-10-16 15:49 UTC (permalink / raw
  To: Maneesh Soni; +Cc: linux-mm, akpm, Dipankar Sarma

Hi Maneesh,

sorry about not getting back with more info sooner. Daytime jobs can
be all-consuming.

I tried doing what Andrew suggested, and enabling all memory debugging
options. This did not produce anything.

The setup here:

Workstation where I see the problem is a PII/350, 392 MB RAM and
some swap. Just about all the software packages are from Red Hat 8
(recently upgraded from a 7.x installation).

SCSI disk off an Symbios Logic 53c875 controller is used for Linux.
There is an IDE disk in the system and the kernel has support for it,
but it is not used normally (nothing mounted).

Network is with an Intel eepro100 adapter, gets an IP via DHCP.

root-fs is a local filesystem on the scsi disk, reiserfs formatted.
/home is NFS-mounted from a Linux server running kernel 2.4.19

The kernel sources are located in /usr/src which is on the local
(combined root+usr) filesystem, but I normally go there via a
symlink in my home-dir, ~/kernel/linux-2.5-mm/ is the directory
for the 2.5+mm directory I use.

The system runs apmd, atd, crond, autofs (for mounting /home), gpm,
lpd, nfs-server (the /usr/src directory is exported), nfs-client,
ntpd, portmap, sshd, xfs and xinetd. A DHCP client is also running.
No X server has been running while I've tested these hangs.

To recreate it, I've booted up the 2.5.2-mm2 kernel, starting up
all the normal services. Log in (automounts home directory), 
cd ~/kernel/linux-2.5-mm, make oldconfig, make clean, make

The system then hangs after a few minutes of working through the
kernel compile. Not the same place everytime.

I've got some time tonight, so I will try un-doing the patch you
mention and see if that changes anything.

Thanks,

Henrik

On Wed, Oct 16, 2002 at 06:39:07PM +0530, Maneesh Soni wrote:
> On Sun, Oct 13, 2002 at 10:34:40PM +0000, Henrik Storner wrote:
> > On Sun, Oct 13, 2002 at 12:31:52PM -0700, Andrew Morton wrote:
> > > Henrik Storner wrote:
> > > > 
> > > > I gave 2.5.42-mm2 a test run yesterday, and it hung the box solid
> > > > while doing a kernel compile. The compile stopped dead in the middle
> > > > of a file, and there was no response when trying to access another
> > > > console (no X running). Alt-sysrq worked, so it wasn't completely dead
> > > > - sync/umount/reboot worked.
> > > > 
> > > > Nothing in the logs - no oops or other kernel messages.
> > > > 
> > > > Rebooted and repeated the experiment with the same result,
> > > > so it appears to be reproducible.
> > > > 
> > > > Stock 2.5.42 has worked OK for a day now, including kernel
> > > > compiles - the system has performed flawlessly for a
> > > > couple of years as my normal workstation.
> > > > 
> > > > PII processor, 384 MB RAM, SCSI disk (ncr53c8xx driver),
> > > > Intel eepro/100 network adapter. Kernel config at
> > > > http://www.hswn.dk/config-2.5.42-mm2
> > > 
> > > Very odd.
> > > 
> > > If you have time, could you please enable "load all symbols"
> > > in the kernel hacking menu and capture a sysrq-T trace?
> > > Thanks.
> > 
> > Did so - built it again from a fresh kernel tree, just to be sure.
> > Compiler is gcc 3.2 from Red Hat 8, by the way.
> > 
> > Bug is still there. sysrq-T scrolls off the screen too fast for me to
> > read, but the last screenful has several processes like this (could
> > see sh, make, sh, gcc):
> > 
> > Call Trace:
> 
> Hello Henrik,
> 
> I tired recreating the hang, but it didnot occur. I could guess from the
> call trace that you are using reiserfs and nfs but I not very clear how
> are you recreating it. I created a resierfs partition and exported it. Then
> tried to compile a kernel over it. I used the config file from the site
> you mentioned.
> 
> It will be nice if you can list the exact recreation steps mentioning the
> filesystems you are using.
> 
> As the hang looks like a loop in d_lookup can you  try
> recreating it *without* dcache_rcu.patch. You can backout this patch
> 
> http://www.zipworld.com.au/~akpm/linux/patches/2.5/2.5.42/2.5.42-mm2/broken-out/dcache_rcu.patch
> 
> 
> Thanks
> Maneesh
> 
> -- 
> Maneesh Soni
> IBM Linux Technology Center, 
> IBM India Software Lab, Bangalore.
> Phone: +91-80-5044999 email: maneesh@in.ibm.com
> http://lse.sourceforge.net/
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/

-- 
Henrik Storner <henrik@hswn.dk> 
Hvis du vil have god, palidelig info om Open Source og Linux, sa 
overvej at stotte Linux Weekly News med et abonnement.
                                   http://lwn.net/Articles/10688/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.42-mm2 hangs system
  2002-10-16 15:49       ` Henrik Størner
@ 2002-10-16 18:59         ` Henrik Størner
  2002-10-16 19:31           ` Dipankar Sarma
  2002-10-30  9:48           ` [FIX] " Maneesh Soni
  2002-10-17 14:38         ` Maneesh Soni
  1 sibling, 2 replies; 21+ messages in thread
From: Henrik Størner @ 2002-10-16 18:59 UTC (permalink / raw
  To: Maneesh Soni; +Cc: linux-mm, akpm, Dipankar Sarma

Hi Maneesh,

On Wed, Oct 16, 2002 at 05:49:43PM +0200, Henrik Storner wrote:
> On Wed, Oct 16, 2002 at 06:39:07PM +0530, Maneesh Soni wrote:
> > As the hang looks like a loop in d_lookup can you  try
> > recreating it *without* dcache_rcu.patch. You can backout this patch
> > 
> > http://www.zipworld.com.au/~akpm/linux/patches/2.5/2.5.42/2.5.42-mm2/broken-out/dcache_rcu.patch
> > 
> I've got some time tonight, so I will try un-doing the patch you
> mention and see if that changes anything.

well you hit the nail right on the head there.

I've just been running the 2.5.42-mm2 kernel except for the dcache_rcu
patch for a full hour, and I was unable to reproduce the hangs that I
saw with the full -mm2 patch installed. Did two full kernel builds
while reading some mail and doing other stuff - no problems what so
ever.

Just to be sure, I re-applied the dcache_rcu patch, rebuilt the
kernel, booted with the kernel containing dcache_rcu patch,
and the system died within a few minutes.

So it is definitely something in the dcache_rcu patch that does it.

-- 
Henrik Storner <henrik@hswn.dk> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.42-mm2 hangs system
  2002-10-16 18:59         ` Henrik Størner
@ 2002-10-16 19:31           ` Dipankar Sarma
  2002-10-16 19:43             ` Andrew Morton
  2002-10-30  9:48           ` [FIX] " Maneesh Soni
  1 sibling, 1 reply; 21+ messages in thread
From: Dipankar Sarma @ 2002-10-16 19:31 UTC (permalink / raw
  To: Henrik Størner; +Cc: Maneesh Soni, linux-mm, akpm

On Wed, Oct 16, 2002 at 08:59:08PM +0200, Henrik Storner wrote:
> well you hit the nail right on the head there.
> 
> I've just been running the 2.5.42-mm2 kernel except for the dcache_rcu
> patch for a full hour, and I was unable to reproduce the hangs that I
> saw with the full -mm2 patch installed. Did two full kernel builds
> while reading some mail and doing other stuff - no problems what so
> ever.
> 
> Just to be sure, I re-applied the dcache_rcu patch, rebuilt the
> kernel, booted with the kernel containing dcache_rcu patch,
> and the system died within a few minutes.
> 
> So it is definitely something in the dcache_rcu patch that does it.

Well, I am not quite sure of this yet. Maneesh pointed out this earlier -
In this machine with 2.5.42-mm2 and no dcache_rcu, (with your .config), 
we see  this -

[root@llm04 dbench]# df
Filesystem           1k-blocks      Used Available Use% Mounted on
/dev/sda6              1004024    461168    491852  49% /
/dev/sda1               505605     38348    441153   8% /boot
/dev/sda5              2514172   1791560    594900  76% /usr
none                    257532         0    257532   0% /dev/shm
/dev/sdb5              6324896     23996   5979604   1% /mnt/sdb5
llm04:/mnt/sdb5        6324896     23968   5979616   1% /mnt/sdc1
/dev/sda2              9068648   3993040   4614948  47% /home
[root@llm04 dbench]# pwd
/mnt/sdc1/dbench
root@llm04 dbench]# ./dbench 4
4 clients started
..........................................................................................................................................rmdir CLIENTS/CLIENT2/~DMTMP/WORDPRO failed (Directory not empty)
rmdir CLIENTS/CLIENT2/~DMTMP/PARADOX failed (Directory not empty)
rmdir CLIENTS/CLIENT2/~DMTMP failed (Directory not empty)
+.......rmdir CLIENTS/CLIENT0/~DMTMP/WORDPRO failed (Directory not empty)
rmdir CLIENTS/CLIENT0/~DMTMP/PARADOX failed (Directory not empty)
.rmdir CLIENTS/CLIENT0/~DMTMP failed (Directory not empty)
+.rmdir CLIENTS/CLIENT3/~DMTMP/WORDPRO failed (Directory not empty)
rmdir CLIENTS/CLIENT3/~DMTMP/PARADOX failed (Directory not empty)
rmdir CLIENTS/CLIENT3/~DMTMP failed (Directory not empty)
+.rmdir CLIENTS/CLIENT1/~DMTMP/WORDPRO failed (Directory not empty)
rmdir CLIENTS/CLIENT1/~DMTMP/PARADOX failed (Directory not empty)
rmdir CLIENTS/CLIENT1/~DMTMP failed (Directory not empty)
+****
Throughput 36.6733 MB/sec (NB=45.8417 MB/sec  366.733 MBit/sec)

This needs more investigation. I would be really supprised if dcache_rcu
has any effect on UP code.

Thanks
-- 
Dipankar Sarma  <dipankar@in.ibm.com> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.42-mm2 hangs system
  2002-10-16 19:31           ` Dipankar Sarma
@ 2002-10-16 19:43             ` Andrew Morton
  2002-10-16 20:05               ` Dipankar Sarma
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Morton @ 2002-10-16 19:43 UTC (permalink / raw
  To: dipankar; +Cc: Henrik Størner, Maneesh Soni, linux-mm

Dipankar Sarma wrote:
> 
> On Wed, Oct 16, 2002 at 08:59:08PM +0200, Henrik Storner wrote:
> > well you hit the nail right on the head there.
> >
> > I've just been running the 2.5.42-mm2 kernel except for the dcache_rcu
> > patch for a full hour, and I was unable to reproduce the hangs that I
> > saw with the full -mm2 patch installed. Did two full kernel builds
> > while reading some mail and doing other stuff - no problems what so
> > ever.
> >
> > Just to be sure, I re-applied the dcache_rcu patch, rebuilt the
> > kernel, booted with the kernel containing dcache_rcu patch,
> > and the system died within a few minutes.
> >
> > So it is definitely something in the dcache_rcu patch that does it.
> 
> Well, I am not quite sure of this yet. Maneesh pointed out this earlier -
> In this machine with 2.5.42-mm2 and no dcache_rcu, (with your .config),
> we see  this -
> 
> [root@llm04 dbench]# df
> Filesystem           1k-blocks      Used Available Use% Mounted on
> /dev/sda6              1004024    461168    491852  49% /
> /dev/sda1               505605     38348    441153   8% /boot
> /dev/sda5              2514172   1791560    594900  76% /usr
> none                    257532         0    257532   0% /dev/shm
> /dev/sdb5              6324896     23996   5979604   1% /mnt/sdb5
> llm04:/mnt/sdb5        6324896     23968   5979616   1% /mnt/sdc1
> /dev/sda2              9068648   3993040   4614948  47% /home
> [root@llm04 dbench]# pwd
> /mnt/sdc1/dbench
> root@llm04 dbench]# ./dbench 4
> 4 clients started
> ..........................................................................................................................................rmdir CLIENTS/CLIENT2/~DMTMP/WORDPRO failed (Directory not empty)
> rmdir CLIENTS/CLIENT2/~DMTMP/PARADOX failed (Directory not empty)

Is this dbench-on-NFS?  That has always failed - it's to do
with the funny NFS handling of unlinked-while-open files.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.42-mm2 hangs system
  2002-10-16 19:43             ` Andrew Morton
@ 2002-10-16 20:05               ` Dipankar Sarma
  0 siblings, 0 replies; 21+ messages in thread
From: Dipankar Sarma @ 2002-10-16 20:05 UTC (permalink / raw
  To: Andrew Morton; +Cc: Henrik Størner, Maneesh Soni, linux-mm

On Wed, Oct 16, 2002 at 12:43:06PM -0700, Andrew Morton wrote:
> Is this dbench-on-NFS?  That has always failed - it's to do
> with the funny NFS handling of unlinked-while-open files.

Yes, it was.

I guess the thing to do would be to investigate NFS with dcache_rcu
and see where the don't mix. IIRC, this combination was tested a while ago, 
maybe 2.5.2x timeframe. We'll see.

Thanks
-- 
Dipankar Sarma  <dipankar@in.ibm.com> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.42-mm2 on small systems
  2002-10-15  6:42         ` Andrew Morton
@ 2002-10-16 20:55           ` Bill Davidsen
  2002-10-16 22:43             ` Ed Tomlinson
  0 siblings, 1 reply; 21+ messages in thread
From: Bill Davidsen @ 2002-10-16 20:55 UTC (permalink / raw
  To: Andrew Morton; +Cc: Ed Tomlinson, linux-mm

On Mon, 14 Oct 2002, Andrew Morton wrote:

> hm.  Works for me.  The default setting are waaay too boring, so
> I used ./resp -m2 -M5 -w5

The problem with reducing the sleep is that it hides a kernel which is
swappy, since there isn't time to build up a big backlog of disk writes,
and the swap doesn't seem to happen right away.

And I often see jackpot cases which are less likely to happen if you
reduce the number of tests. Again it makes the kernel look good, but may
not reflect what's really happening. I agree that it's slow, I've been
debugging it for several weeks now, but every time I think I've got the
corner cases cornered I find another corner.

The next version will add -R to set the retry max count, because some
kernels don't recover from one test and return no resources on fork()
because they haven't cleaned up all terminated processes.

This was intended to be a simple test of how the kernel feels, and it is
that, but some kernels I've tried get to one test or another and shit the
bed every time. It's not a stress test! How can I get my numbers if the
kernel keeps hanging solid? ;-)

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.42-mm2 on small systems
  2002-10-16 20:55           ` Bill Davidsen
@ 2002-10-16 22:43             ` Ed Tomlinson
  0 siblings, 0 replies; 21+ messages in thread
From: Ed Tomlinson @ 2002-10-16 22:43 UTC (permalink / raw
  To: Bill Davidsen; +Cc: linux-mm

On October 16, 2002 04:55 pm, Bill Davidsen wrote:
> On Mon, 14 Oct 2002, Andrew Morton wrote:
> > hm.  Works for me.  The default setting are waaay too boring, so
> > I used ./resp -m2 -M5 -w5

> This was intended to be a simple test of how the kernel feels, and it is
> that, but some kernels I've tried get to one test or another and shit the
> bed every time. It's not a stress test! How can I get my numbers if the
> kernel keeps hanging solid? ;-)

You add sufficient tracing so you can find were it hangs...  And report it
so it can get fixed.  IMHO, while not a stress test, it can put stress on
the kernel - it needs to to test the interactive response.

Still trying to figure out what is happening on my 64m 486.

Thanks for the interesting benchmark.

Ed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.42-mm2 hangs system
  2002-10-16 15:49       ` Henrik Størner
  2002-10-16 18:59         ` Henrik Størner
@ 2002-10-17 14:38         ` Maneesh Soni
  2002-10-17 16:14           ` 2.5.43-mm2 gets network connection stuck Sebastian Benoit
  1 sibling, 1 reply; 21+ messages in thread
From: Maneesh Soni @ 2002-10-17 14:38 UTC (permalink / raw
  To: Henrik Størner; +Cc: linux-mm, akpm, Dipankar Sarma

On Wed, Oct 16, 2002 at 05:49:43PM +0200, Henrik Storner wrote:

> The kernel sources are located in /usr/src which is on the local
> (combined root+usr) filesystem, but I normally go there via a
> symlink in my home-dir, ~/kernel/linux-2.5-mm/ is the directory
> for the 2.5+mm directory I use.
> 
> The system runs apmd, atd, crond, autofs (for mounting /home), gpm,
> lpd, nfs-server (the /usr/src directory is exported), nfs-client,
> ntpd, portmap, sshd, xfs and xinetd. A DHCP client is also running.
> No X server has been running while I've tested these hangs.
> 
> To recreate it, I've booted up the 2.5.2-mm2 kernel, starting up
> all the normal services. Log in (automounts home directory), 
> cd ~/kernel/linux-2.5-mm, make oldconfig, make clean, make
> 
> The system then hangs after a few minutes of working through the
> kernel compile. Not the same place everytime.


I tried similar setup that is making link to an local reiserfs partition 
on an NFS mounted partition. NFS server was running on a system with 2.4.19
kernel. I had the following setup

[root@llm04 root]# mount
/dev/sda6 on / type ext2 (rw)
none on /proc type proc (rw)
/dev/sda1 on /boot type ext2 (rw)
/dev/sda2 on /home type ext2 (rw)
/dev/sda5 on /usr type ext2 (rw)
none on /dev/shm type tmpfs (rw)
/dev/sdc3 on /mnt/sdc3 type reiserfs (rw)
/dev/sdb1 on /bm type ext2 (rw)
192.168.1.10:/home/maneesh/test on /mnt/sdc2 type nfs (rw,addr=192.168.1.10)

[root@llm04 tmp]# l
total 8
drwxr-xr-x    5 nfsnobod nfsnobod     4096 Oct 17 16:35 dbench
lrwxrwxrwx    1 root     root           10 Oct 17 16:08 dbench-link-to-ext2-local -> /bm/dbench
lrwxrwxrwx    1 root     root           17 Oct 17 15:03 dbench-link-to-rfs-local -> /mnt/sdc3/dbench/
lrwxrwxrwx    1 root     root           23 Oct 17 15:05 linux-2542-link-to-rfs-local -> /mnt/sdc3/linux-2.5.42/
drwxrwxr-x   17 1046     101          4096 Oct 17 14:39 linux-2.5.43
lrwxrwxrwx    1 root     root           19 Oct 17 15:08 linux-2543-link-to-ext2-local -> /src1/linux-2.5.43/

With this setup I could run make properly. Even dbench also runs fine if
ran through the link. 

The problem I am seeing is only when I am running dbench directly over the
nfs mounted partition (i.e, no sym link). I see dbench giving errors and 
_sometimes_ hanging the system. 

Where as if I ran the nfs-server on the same machine like yesterday I see
hang occuring all the time.

With your setup I didnot see that you don't need nfs-server running. So just
to narrow down the problem can you stop nfs-server and then do the make.

Thanks
Maneesh

-- 
Maneesh Soni
IBM Linux Technology Center, 
IBM India Software Lab, Bangalore.
Phone: +91-80-5044999 email: maneesh@in.ibm.com
http://lse.sourceforge.net/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* 2.5.43-mm2 gets network connection stuck
  2002-10-17 14:38         ` Maneesh Soni
@ 2002-10-17 16:14           ` Sebastian Benoit
  2002-10-17 17:22             ` Andrew Morton
  0 siblings, 1 reply; 21+ messages in thread
From: Sebastian Benoit @ 2002-10-17 16:14 UTC (permalink / raw
  To: Andrew Morton; +Cc: linux-mm

[-- Attachment #1: Type: text/plain, Size: 1496 bytes --]

Hi, 

funny problem w. 2.5.43-mm2:

i'm running 2.5.43-mm2 on my workstation. Normal workload, X-windows, a few
xterms, editor, mozilla, etc. (host A)

I have a NFS/SAMBA-mount (both show the problem) to host B. Host B runs
2.4.19rc5aa1.

I can get a xterm, in which i have a ssh-connection to a third host C
'stuck' by simply cat'ing a large file from the NFS/SAMBA server to
/dev/null.

The xterm/ssh seems stuck, that is no key i press is received on the other
end, but output of the program running on host C is updated in the xterm. I
checked with tcpdump: the keypress does not generate a packet, my host only
sends ACK's on that ssh connection to host C.

The ssh-connection is not unstuck by stopping the data transfer from host B.

I checked that plain 2.5.42 and 2.5.43-mm1 do not have this problem: here my
input goes through to C. At least for small amounts of input, i did not test
anything beyond typing a few hundret chars.

recap:

 "mount /mnt/hostB"
 "ssh hostC" -> type random stuff in that connection
 at the same time do "cat /mnt/hostB/bigfile > /dev/null"
 ssh gets stuck.

hardware: PIII/600, 3c905B on 10baseT half-duplex

I'm sorry i cant do any further checks until Friday afternoon (MET).

/B.
-- 
Sebastian Benoit <benoit-lists@fb12.de>
My mail is GnuPG signed -- Unsigned ones are bogus -- http://www.gnupg.org/
GnuPG 0x5BA22F00 2001-07-31 2999 9839 6C9E E4BF B540  C44B 4EC4 E1BE 5BA2 2F00

Oxymoron #654: Fatally Injured

[-- Attachment #2: Type: application/pgp-signature, Size: 240 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.5.43-mm2 gets network connection stuck
  2002-10-17 16:14           ` 2.5.43-mm2 gets network connection stuck Sebastian Benoit
@ 2002-10-17 17:22             ` Andrew Morton
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Morton @ 2002-10-17 17:22 UTC (permalink / raw
  To: Sebastian Benoit; +Cc: linux-mm

Sebastian Benoit wrote:
> 
> Hi,
> 
> funny problem w. 2.5.43-mm2:
> 

I saw something like that last night as well.  One ssh session
(sshd running on 2.5.43-mm2) just stopped doing anything.

The -mm patches always include Linus's current -bk snapshot,
and 2.5.43-mm2 has a lot of networking changes:

 net/core/dst.c                           |   25 
 net/ipv4/af_inet.c                       |   17 
 net/ipv4/icmp.c                          |    4 
 net/ipv4/ip_output.c                     |  880 ++++++++--
 net/ipv4/ip_proc.c                       |   74 
 net/ipv4/ip_sockglue.c                   |    4 
 net/ipv4/raw.c                           |    7 
 net/ipv4/tcp.c                           |   49 
 net/ipv4/tcp_ipv4.c                      |    6 
 net/ipv4/tcp_minisocks.c                 |   10 
 net/ipv4/udp.c                           |  296 +++

Looks like something may have broken there.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [FIX] Re: 2.5.42-mm2 hangs system
  2002-10-16 18:59         ` Henrik Størner
  2002-10-16 19:31           ` Dipankar Sarma
@ 2002-10-30  9:48           ` Maneesh Soni
  2002-10-31  7:54             ` Henrik Størner
  1 sibling, 1 reply; 21+ messages in thread
From: Maneesh Soni @ 2002-10-30  9:48 UTC (permalink / raw
  To: Henrik Størner

Hello Henrik,

I hope the following patch should solve your problem. The patch is made
over 2.5.44-mm6 kernel. The problem was due to anonymous dentries getting
connected with DCACHE_UNHASHED flag set.


diff -urN linux-2.5.44-mm6/fs/dcache.c linux-2.5.44-mm6-fix/fs/dcache.c
--- linux-2.5.44-mm6/fs/dcache.c	Wed Oct 30 14:42:33 2002
+++ linux-2.5.44-mm6-fix/fs/dcache.c	Wed Oct 30 13:13:43 2002
@@ -788,12 +788,15 @@
 		res = tmp;
 		tmp = NULL;
 		if (res) {
+			spin_lock(&res->d_lock);
 			res->d_sb = inode->i_sb;
 			res->d_parent = res;
 			res->d_inode = inode;
 			res->d_flags |= DCACHE_DISCONNECTED;
+			res->d_vfs_flags &= ~DCACHE_UNHASHED;
 			list_add(&res->d_alias, &inode->i_dentry);
 			list_add(&res->d_hash, &inode->i_sb->s_anon);
+			spin_unlock(&res->d_lock);
 		}
 		inode = NULL; /* don't drop reference */
 	}


Regards,
Maneesh


On Wed, Oct 16, 2002 at 07:03:14PM +0000, Henrik Storner wrote:
> Hi Maneesh,
> 
> On Wed, Oct 16, 2002 at 05:49:43PM +0200, Henrik Storner wrote:
> > On Wed, Oct 16, 2002 at 06:39:07PM +0530, Maneesh Soni wrote:
> > > As the hang looks like a loop in d_lookup can you  try
> > > recreating it *without* dcache_rcu.patch. You can backout this patch
> > > 
> > > http://www.zipworld.com.au/~akpm/linux/patches/2.5/2.5.42/2.5.42-mm2/broken-out/dcache_rcu.patch
> > > 
> > I've got some time tonight, so I will try un-doing the patch you
> > mention and see if that changes anything.
> 
> well you hit the nail right on the head there.
> 
> I've just been running the 2.5.42-mm2 kernel except for the dcache_rcu
> patch for a full hour, and I was unable to reproduce the hangs that I
> saw with the full -mm2 patch installed. Did two full kernel builds
> while reading some mail and doing other stuff - no problems what so
> ever.
> 
> Just to be sure, I re-applied the dcache_rcu patch, rebuilt the
> kernel, booted with the kernel containing dcache_rcu patch,
> and the system died within a few minutes.
> 
> So it is definitely something in the dcache_rcu patch that does it.
> 
> -- 
> Henrik Storner <henrik@hswn.dk> 

-- 
Maneesh Soni
IBM Linux Technology Center, 
IBM India Software Lab, Bangalore.
Phone: +91-80-5044999 email: maneesh@in.ibm.com
http://lse.sourceforge.net/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [FIX] Re: 2.5.42-mm2 hangs system
  2002-10-30  9:48           ` [FIX] " Maneesh Soni
@ 2002-10-31  7:54             ` Henrik Størner
  0 siblings, 0 replies; 21+ messages in thread
From: Henrik Størner @ 2002-10-31  7:54 UTC (permalink / raw
  To: Maneesh Soni; +Cc: linux-mm

Hi Maneesh,

On Wed, Oct 30, 2002 at 03:18:46PM +0530, Maneesh Soni wrote:
> Hello Henrik,
> 
> I hope the following patch should solve your problem. The patch is made
> over 2.5.44-mm6 kernel. The problem was due to anonymous dentries getting
> connected with DCACHE_UNHASHED flag set.

the patch does fix the sudden halts that I was seeing with
2.5.42-mm2. The system has now survived about 10 successive kernel
compiles and it is still running.

There are a couple of odd things going on, though - but I don't know
for sure if they are related to the mm patch or not. I am seeing these
messages regularly - disk activity seems to provoke them.

Oct 30 23:14:44 osiris kernel: bad: scheduling while atomic!
Oct 30 23:14:44 osiris kernel: Call Trace:
Oct 30 23:14:44 osiris kernel:  [do_schedule+763/768] do_schedule+0x2fb/0x300
Oct 30 23:14:44 osiris kernel:  [<c011973b>] do_schedule+0x2fb/0x300
Oct 30 23:14:44 osiris kernel:  [kswapd+236/284] kswapd+0xec/0x11c
Oct 30 23:14:44 osiris kernel:  [<c013bd9c>] kswapd+0xec/0x11c
Oct 30 23:14:44 osiris kernel:  [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
Oct 30 23:14:44 osiris kernel:  [<c011ae70>] autoremove_wake_function+0x0/0x50
Oct 30 23:14:44 osiris kernel:  [preempt_schedule+54/80] preempt_schedule+0x36/0x50
Oct 30 23:14:44 osiris kernel:  [<c0119776>] preempt_schedule+0x36/0x50
Oct 30 23:14:44 osiris kernel:  [autoremove_wake_function+0/80] autoremove_wake_function+0x0/0x50
Oct 30 23:14:44 osiris kernel:  [<c011ae70>] autoremove_wake_function+0x0/0x50
Oct 30 23:14:44 osiris kernel:  [kswapd+0/284] kswapd+0x0/0x11c
Oct 30 23:14:44 osiris kernel:  [<c013bcb0>] kswapd+0x0/0x11c
Oct 30 23:14:44 osiris kernel:  [kernel_thread_helper+5/24] kernel_thread_helper+0x5/0x18
Oct 30 23:14:44 osiris kernel:  [<c01074cd>] kernel_thread_helper+0x5/0x18
Oct 30 23:14:44 osiris kernel: 

And one full blown Oops apparently when I tried to login to an X
session (I use KDE for the desktop):

Oct 31 08:38:11 osiris kernel: Unable to handle kernel paging request at virtual address 4172f058
Oct 31 08:38:11 osiris kernel:  printing eip:
Oct 31 08:38:11 osiris kernel: 083b80d4
Oct 31 08:38:11 osiris kernel: *pde = 06437067
Oct 31 08:38:11 osiris kernel: *pte = 00000000
Oct 31 08:38:11 osiris kernel: Oops: 0006
Oct 31 08:38:11 osiris kernel: eepro100 mii sb sb_lib uart401 sound soundcore  
Oct 31 08:38:11 osiris kernel: CPU:    0
Oct 31 08:38:11 osiris kernel: EIP:    0023:[serport_exit+138115172/-1072695408]    Not tainted
Oct 31 08:38:11 osiris kernel: EIP:    0023:[<083b80d4>]    Not tainted
Oct 31 08:38:11 osiris kernel: EFLAGS: 00013206
Oct 31 08:38:11 osiris kernel: eax: 0021449c   ebx: 4172f058   ecx: 00000000   edx: 00000000
Oct 31 08:38:11 osiris kdm[8787]: Server for display :0 terminated unexpectedly
Oct 31 08:38:11 osiris kernel: esi: 088674dc   edi: 0021449c   ebp: 00000002   esp: bffff58c
Oct 31 08:38:12 osiris kernel: ds: 002b   es: 002b   ss: 002b
Oct 31 08:38:12 osiris kernel: Process X (pid: 25678, threadinfo=d1f54000 task=d675cce0)
Oct 31 08:38:12 osiris kernel:  <6>note: X[25678] exited with preempt_count 2
Oct 31 08:38:12 osiris kernel: Debug: sleeping function called from illegal context at include/asm/semaphore.h:119
Oct 31 08:38:12 osiris kernel: Call Trace:
Oct 31 08:38:12 osiris kernel:  [shm_close+48/192] shm_close+0x30/0xc0
Oct 31 08:38:12 osiris kernel:  [<c0200190>] shm_close+0x30/0xc0
Oct 31 08:38:12 osiris kernel:  [exit_mmap+214/224] exit_mmap+0xd6/0xe0
Oct 31 08:38:12 osiris kernel:  [<c0133146>] exit_mmap+0xd6/0xe0
Oct 31 08:38:12 osiris kernel:  [mmput+78/160] mmput+0x4e/0xa0
Oct 31 08:38:12 osiris kernel:  [<c011b10e>] mmput+0x4e/0xa0
Oct 31 08:38:12 osiris kernel:  [do_exit+197/688] do_exit+0xc5/0x2b0
Oct 31 08:38:12 osiris kernel:  [<c0120aa5>] do_exit+0xc5/0x2b0
Oct 31 08:38:12 osiris kernel:  [die+134/144] die+0x86/0x90
Oct 31 08:38:12 osiris kernel:  [<c010a456>] die+0x86/0x90
Oct 31 08:38:12 osiris kernel:  [do_page_fault+358/1268] do_page_fault+0x166/0x4f4
Oct 31 08:38:12 osiris kernel:  [<c0118006>] do_page_fault+0x166/0x4f4
Oct 31 08:38:12 osiris kernel:  [vfs_read+230/320] vfs_read+0xe6/0x140
Oct 31 08:38:12 osiris kernel:  [<c0149cf6>] vfs_read+0xe6/0x140
Oct 31 08:38:12 osiris kernel:  [sys_setitimer+86/192] sys_setitimer+0x56/0x160
Oct 31 08:38:12 osiris kernel:  [<c0121c16>] sys_setitimer+0x56/0x160
Oct 31 08:38:12 osiris kernel:  [sys_read+69/96] sys_read+0x45/0x60
Oct 31 08:38:12 osiris kernel:  [<c0149f95>] sys_read+0x45/0x60
Oct 31 08:38:12 osiris kernel:  [do_page_fault+0/1268] do_page_fault+0x0/0x4f4
Oct 31 08:38:12 osiris kernel:  [<c0117ea0>] do_page_fault+0x0/0x4f4
Oct 31 08:38:12 osiris kernel:  [error_code+45/56] error_code+0x2d/0x38
Oct 31 08:38:12 osiris kernel:  [<c0109e75>] error_code+0x2d/0x38
Oct 31 08:38:12 osiris kernel: 

-- 
Henrik Storner <henrik@hswn.dk> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2002-10-31  7:54 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-13 16:04 2.5.42-mm2 hangs system Henrik Størner
2002-10-13 21:03 ` William Lee Irwin III
     [not found] ` <3DA9CA28.155BA5CB@digeo.com>
2002-10-13 22:33   ` Henrik Størner
2002-10-13 22:57     ` Andrew Morton
2002-10-14 12:25       ` 2.5.42-mm2 on small systems Ed Tomlinson
2002-10-14 14:34         ` Martin J. Bligh
2002-10-14 21:24           ` Bill Davidsen
2002-10-15  6:42         ` Andrew Morton
2002-10-16 20:55           ` Bill Davidsen
2002-10-16 22:43             ` Ed Tomlinson
2002-10-16 13:09     ` 2.5.42-mm2 hangs system Maneesh Soni
2002-10-16 15:49       ` Henrik Størner
2002-10-16 18:59         ` Henrik Størner
2002-10-16 19:31           ` Dipankar Sarma
2002-10-16 19:43             ` Andrew Morton
2002-10-16 20:05               ` Dipankar Sarma
2002-10-30  9:48           ` [FIX] " Maneesh Soni
2002-10-31  7:54             ` Henrik Størner
2002-10-17 14:38         ` Maneesh Soni
2002-10-17 16:14           ` 2.5.43-mm2 gets network connection stuck Sebastian Benoit
2002-10-17 17:22             ` Andrew Morton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.