All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* Zombie process when ptracing
@ 2009-11-19 10:25 Nick Piggin
  2009-11-20  1:29 ` Oleg Nesterov
  0 siblings, 1 reply; 4+ messages in thread
From: Nick Piggin @ 2009-11-19 10:25 UTC (permalink / raw
  To: Linux Kernel Mailing List, Roland McGrath, Oleg Nesterov

Hi,

Running recent git kernel, I have a process stuck in Z state

bash          ? 0000000000000000     0  3188   3187 0x00000000
 ffff88012e24fec8 0000000000000046 0000000000000000 0000000000000012
 ffff88012e24fec8 ffff88012e24e000 ffff88012e24ffd8 ffff88012e24e000
 000000000000efc8 ffff88012e24e000 ffff88012ea82090 ffff88012ff78640
Call Trace:
 [<ffffffff8124baee>] ? proc_clear_tty+0x5e/0x70
 [<ffffffff810587a8>] ? exit_ptrace+0xb8/0x140
 [<ffffffff8105126a>] do_exit+0x58a/0x7c0
 [<ffffffff810514dd>] do_group_exit+0x3d/0xb0
 [<ffffffff81051562>] sys_exit_group+0x12/0x20
 [<ffffffff8100b3eb>] system_call_fastpath+0x16/0x1b

This was after stracing a few test programs.

It also seems to have lost job control (^C) at the same time.

Hmm, and the kernel just paniced with an nmi lockup while I was
trying to get more info.

Call Trace:
 <IRQ>
 [<ffffffff811e5aa3>] __const_udelay+0x43/0x50
 [<ffffffff810261bc>] arch_trigger_all_cpu_backtrace+0x4c/0x70
 [<ffffffff81264f79>] sysrq_handle_showallcpus+0x9/0x10
 [<ffffffff81264d10>] __handle_sysrq+0x120/0x180
 [<ffffffff81264de6>] handle_sysrq+0x26/0x30
 [<ffffffff81275af0>] serial8250_handle_port+0x210/0x2f0
 [<ffffffff81275c58>] serial8250_interrupt+0x88/0x120
 [<ffffffff810872e7>] handle_IRQ_event+0xa7/0x1e0
 [<ffffffff810891cc>] handle_edge_irq+0xbc/0x150
 [<ffffffff8100e2df>] handle_irq+0x1f/0x30
 [<ffffffff8100d86a>] do_IRQ+0x6a/0xe0
 [<ffffffff8100bc93>] ret_from_intr+0x0/0xa
 <EOI>
 [<ffffffff81013842>] ? default_idle+0xa2/0xc0
 [<ffffffff8106f8d1>] ? __atomic_notifier_call_chain+0x31/0x60
 [<ffffffff81013b1a>] ? c1e_idle+0x3a/0x100
 [<ffffffff8106f911>] ? atomic_notifier_call_chain+0x11/0x20
 [<ffffffff8100a66b>] ? cpu_idle+0x6b/0xc0
 [<ffffffff81439dc6>] ? start_secondary+0x17c/0x1d6

I'll update this space if I can repeat it again. Any ideas?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Zombie process when ptracing
  2009-11-19 10:25 Zombie process when ptracing Nick Piggin
@ 2009-11-20  1:29 ` Oleg Nesterov
  2009-11-23  8:36   ` Nick Piggin
  0 siblings, 1 reply; 4+ messages in thread
From: Oleg Nesterov @ 2009-11-20  1:29 UTC (permalink / raw
  To: Nick Piggin; +Cc: Linux Kernel Mailing List, Roland McGrath

Hi,

On 11/19, Nick Piggin wrote:
>
> Running recent git kernel, I have a process stuck in Z state
>
> bash          ? 0000000000000000     0  3188   3187 0x00000000
>  ffff88012e24fec8 0000000000000046 0000000000000000 0000000000000012
>  ffff88012e24fec8 ffff88012e24e000 ffff88012e24ffd8 ffff88012e24e000
>  000000000000efc8 ffff88012e24e000 ffff88012ea82090 ffff88012ff78640
> Call Trace:
>  [<ffffffff8124baee>] ? proc_clear_tty+0x5e/0x70
>  [<ffffffff810587a8>] ? exit_ptrace+0xb8/0x140
>  [<ffffffff8105126a>] do_exit+0x58a/0x7c0
>  [<ffffffff810514dd>] do_group_exit+0x3d/0xb0
>  [<ffffffff81051562>] sys_exit_group+0x12/0x20
>  [<ffffffff8100b3eb>] system_call_fastpath+0x16/0x1b
>
> This was after stracing a few test programs.
>
> It also seems to have lost job control (^C) at the same time.

This can happen if the tracer (strace) itself hangs, zombies
should go away once the tracer is killed. Or its ->real_parent
is stopped or hangs...

(I assume you didn't strace /sbin/init)

But,

> Hmm, and the kernel just paniced with an nmi lockup while I was
> trying to get more info.

this probably means we have a kernel bug ;)

If you see a zombie again, could you look at its /ptoc/pid/status?


And of course, which programs did you trace and how? It would be
great if we can reproduce the problem.

Oleg.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Zombie process when ptracing
  2009-11-20  1:29 ` Oleg Nesterov
@ 2009-11-23  8:36   ` Nick Piggin
  2009-11-23 15:16     ` Oleg Nesterov
  0 siblings, 1 reply; 4+ messages in thread
From: Nick Piggin @ 2009-11-23  8:36 UTC (permalink / raw
  To: Oleg Nesterov; +Cc: Linux Kernel Mailing List, Roland McGrath

On Fri, Nov 20, 2009 at 02:29:30AM +0100, Oleg Nesterov wrote:
> Hi,
> 
> On 11/19, Nick Piggin wrote:
> >
> > Running recent git kernel, I have a process stuck in Z state
> >
> > bash          ? 0000000000000000     0  3188   3187 0x00000000
> >  ffff88012e24fec8 0000000000000046 0000000000000000 0000000000000012
> >  ffff88012e24fec8 ffff88012e24e000 ffff88012e24ffd8 ffff88012e24e000
> >  000000000000efc8 ffff88012e24e000 ffff88012ea82090 ffff88012ff78640
> > Call Trace:
> >  [<ffffffff8124baee>] ? proc_clear_tty+0x5e/0x70
> >  [<ffffffff810587a8>] ? exit_ptrace+0xb8/0x140
> >  [<ffffffff8105126a>] do_exit+0x58a/0x7c0
> >  [<ffffffff810514dd>] do_group_exit+0x3d/0xb0
> >  [<ffffffff81051562>] sys_exit_group+0x12/0x20
> >  [<ffffffff8100b3eb>] system_call_fastpath+0x16/0x1b
> >
> > This was after stracing a few test programs.
> >
> > It also seems to have lost job control (^C) at the same time.
> 
> This can happen if the tracer (strace) itself hangs, zombies
> should go away once the tracer is killed. Or its ->real_parent
> is stopped or hangs...
> 
> (I assume you didn't strace /sbin/init)

No, I straced something else, and all straces seemed to be
killed but bash remained. I was running a script that in
turn launched another process, so I ran it via
strace -ff bash ./script.sh


> But,
> 
> > Hmm, and the kernel just paniced with an nmi lockup while I was
> > trying to get more info.
> 
> this probably means we have a kernel bug ;)

Hmm sorry that seemed like it _may_ have been an unrelated issue
(with the ssh connection).

 
> If you see a zombie again, could you look at its /ptoc/pid/status?

OK, any other hints if I see it again?


> And of course, which programs did you trace and how? It would be
> great if we can reproduce the problem.

At this stage I have not reproduced it, and I can't share the program
which was being straced. If it does happen again and I cannot distil
a simple test case, I will ask permission to distribute it.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Zombie process when ptracing
  2009-11-23  8:36   ` Nick Piggin
@ 2009-11-23 15:16     ` Oleg Nesterov
  0 siblings, 0 replies; 4+ messages in thread
From: Oleg Nesterov @ 2009-11-23 15:16 UTC (permalink / raw
  To: Nick Piggin; +Cc: Linux Kernel Mailing List, Roland McGrath

On 11/23, Nick Piggin wrote:
>
> On Fri, Nov 20, 2009 at 02:29:30AM +0100, Oleg Nesterov wrote:
> > Hi,
> >
> > On 11/19, Nick Piggin wrote:
> > >
> > > Running recent git kernel, I have a process stuck in Z state
> > >
> > > bash          ? 0000000000000000     0  3188   3187 0x00000000
> > >  ffff88012e24fec8 0000000000000046 0000000000000000 0000000000000012
> > >  ffff88012e24fec8 ffff88012e24e000 ffff88012e24ffd8 ffff88012e24e000
> > >  000000000000efc8 ffff88012e24e000 ffff88012ea82090 ffff88012ff78640
> > > Call Trace:
> > >  [<ffffffff8124baee>] ? proc_clear_tty+0x5e/0x70
> > >  [<ffffffff810587a8>] ? exit_ptrace+0xb8/0x140
> > >  [<ffffffff8105126a>] do_exit+0x58a/0x7c0
> > >  [<ffffffff810514dd>] do_group_exit+0x3d/0xb0
> > >  [<ffffffff81051562>] sys_exit_group+0x12/0x20
> > >  [<ffffffff8100b3eb>] system_call_fastpath+0x16/0x1b
> > >
> > > This was after stracing a few test programs.
> > >
> > > It also seems to have lost job control (^C) at the same time.
> >
> > This can happen if the tracer (strace) itself hangs, zombies
> > should go away once the tracer is killed. Or its ->real_parent
> > is stopped or hangs...
> >
> > (I assume you didn't strace /sbin/init)
>
> No, I straced something else, and all straces seemed to be
> killed but bash remained. I was running a script that in
> turn launched another process, so I ran it via
> strace -ff bash ./script.sh

OK, thanks.

Hmm. Just noticed the state above == '?'. Looks like sched_show_task()
is buggy, it should check ->exit_state for "ZX" from TASK_STATE_TO_CHAR_STR.
But this is off-topic.

> > If you see a zombie again, could you look at its /ptoc/pid/status?
>
> OK, any other hints if I see it again?

Well, also the contents of /proc/PPid/status and /proc/TracerPid/status
may help. And sysrq-t ouput. Otherwise, currently I have no idea where
to start.

Oleg.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-11-23 15:21 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-19 10:25 Zombie process when ptracing Nick Piggin
2009-11-20  1:29 ` Oleg Nesterov
2009-11-23  8:36   ` Nick Piggin
2009-11-23 15:16     ` Oleg Nesterov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.