From: Jan Stancek <jstancek@redhat.com>
To: Dave Jones <davej@codemonkey.org.uk>
Cc: trinity@vger.kernel.org
Subject: Re: [bug] child processes stall forever and don't get killed
Date: Tue, 13 Sep 2016 08:00:51 -0400 (EDT) [thread overview]
Message-ID: <891325855.260183.1473768051743.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20160910014630.kszwikfkznrpzqic@codemonkey.org.uk>
----- Original Message -----
> From: "Dave Jones" <davej@codemonkey.org.uk>
> To: "Jan Stancek" <jstancek@redhat.com>
> Cc: trinity@vger.kernel.org
> Sent: Saturday, 10 September, 2016 3:46:30 AM
> Subject: Re: [bug] child processes stall forever and don't get killed
>
> On Fri, Sep 09, 2016 at 10:16:17AM -0400, Jan Stancek wrote:
>
> > > > I'm seeing more the opposite of what commit above says. Most CPUs
> > > > are idle, because N-1 children are stuck in recv/read/...
> > > > and last child manages to keep going. Then by a chance it also hits
> > > > a syscall that doesn't complete and system stays idle
> > > > (after ~hour I gave up waiting).
> > >
> > > Need to think some more on this, but as a quick guess...
> > > try replacing the <= BEFORE with < BEFORE
> >
> > I've started new test with patch above reverted and that looks good
> > so far. No stalls after 1 hour. Previously it stalled after ~20-30
> > minutes. I noticed that when syscall stat messages (those which show
> > number of iteration) stopped appearing.
>
> Ok, I committed that, but with a minor change to widen how long we spend
> in BEFORE state slightly. I doubt that part will have a negative effect,
> but holler if it does..
I applied this patch and I haven't seen stalls in over-night test.
Thanks,
Jan
>
> > > I'll try and find some time to look into this soon. I'm surprised I
> > > haven't also seen it happen though. How many CPUs & how many child
> > > processes ?
> >
> > Anywhere from 2-8 CPUs, 8-32 children on x86_64, ppc64le and s390x
> > systems (RHEL7.3 Beta). It happened usually within 20-30 minutes.
>
> Weird. I'm doing 24/7 runs on one quad core and didn't hit it.
> But I wonder if I was just fortunate enough that I had some children
> always making progress even if N-1 were stuck.
>
> Dave
>
>
prev parent reply other threads:[~2016-09-13 12:00 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1139550397.1201862.1473415639192.JavaMail.zimbra@redhat.com>
2016-09-09 10:30 ` [bug] child processes stall forever and don't get killed Jan Stancek
2016-09-09 13:32 ` Dave Jones
2016-09-09 14:16 ` Jan Stancek
2016-09-10 1:46 ` Dave Jones
2016-09-13 12:00 ` Jan Stancek [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=891325855.260183.1473768051743.JavaMail.zimbra@redhat.com \
--to=jstancek@redhat.com \
--cc=davej@codemonkey.org.uk \
--cc=trinity@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).