trinity.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Stancek <jstancek@redhat.com>
To: Dave Jones <davej@codemonkey.org.uk>
Cc: trinity@vger.kernel.org
Subject: Re: [bug] child processes stall forever and don't get killed
Date: Tue, 13 Sep 2016 08:00:51 -0400 (EDT)	[thread overview]
Message-ID: <891325855.260183.1473768051743.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20160910014630.kszwikfkznrpzqic@codemonkey.org.uk>



----- Original Message -----
> From: "Dave Jones" <davej@codemonkey.org.uk>
> To: "Jan Stancek" <jstancek@redhat.com>
> Cc: trinity@vger.kernel.org
> Sent: Saturday, 10 September, 2016 3:46:30 AM
> Subject: Re: [bug] child processes stall forever and don't get killed
> 
> On Fri, Sep 09, 2016 at 10:16:17AM -0400, Jan Stancek wrote:
>  
>  > >  > I'm seeing more the opposite of what commit above says. Most CPUs
>  > >  > are idle, because N-1 children are stuck in recv/read/...
>  > >  > and last child manages to keep going. Then by a chance it also hits
>  > >  > a syscall that doesn't complete and system stays idle
>  > >  > (after ~hour I gave up waiting).
>  > > 
>  > > Need to think some more on this, but as a quick guess...
>  > > try replacing the <= BEFORE with < BEFORE
>  > 
>  > I've started new test with patch above reverted and that looks good
>  > so far. No stalls after 1 hour. Previously it stalled after ~20-30
>  > minutes. I noticed that when syscall stat messages (those which show
>  > number of iteration) stopped appearing.
> 
> Ok, I committed that, but with a minor change to widen how long we spend
> in BEFORE state slightly. I doubt that part will have a negative effect,
> but holler if it does..

I applied this patch and I haven't seen stalls in over-night test.

Thanks,
Jan

> 
>  > > I'll try and find some time to look into this soon. I'm surprised I
>  > > haven't also seen it happen though.  How many CPUs & how many child
>  > > processes ?
>  > 
>  > Anywhere from 2-8 CPUs, 8-32 children on x86_64, ppc64le and s390x
>  > systems (RHEL7.3 Beta). It happened usually within 20-30 minutes.
> 
> Weird. I'm doing 24/7 runs on one quad core and didn't hit it.
> But I wonder if I was just fortunate enough that I had some children
> always making progress even if N-1 were stuck.
> 
> 	Dave
> 
> 

      reply	other threads:[~2016-09-13 12:00 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1139550397.1201862.1473415639192.JavaMail.zimbra@redhat.com>
2016-09-09 10:30 ` [bug] child processes stall forever and don't get killed Jan Stancek
2016-09-09 13:32   ` Dave Jones
2016-09-09 14:16     ` Jan Stancek
2016-09-10  1:46       ` Dave Jones
2016-09-13 12:00         ` Jan Stancek [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=891325855.260183.1473768051743.JavaMail.zimbra@redhat.com \
    --to=jstancek@redhat.com \
    --cc=davej@codemonkey.org.uk \
    --cc=trinity@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).