From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757752AbYHAJB3 (ORCPT ); Fri, 1 Aug 2008 05:01:29 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752307AbYHAJBW (ORCPT ); Fri, 1 Aug 2008 05:01:22 -0400 Received: from mx3.mail.elte.hu ([157.181.1.138]:45460 "EHLO mx3.mail.elte.hu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752144AbYHAJBV (ORCPT ); Fri, 1 Aug 2008 05:01:21 -0400 Date: Fri, 1 Aug 2008 11:01:00 +0200 From: Ingo Molnar To: David Miller Cc: torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, a.p.zijlstra@chello.nl Subject: Re: [git pull] scheduler fixes Message-ID: <20080801090100.GA25142@elte.hu> References: <20080731.150457.77077498.davem@davemloft.net> <20080731222624.GA22426@elte.hu> <20080731.155504.167792984.davem@davemloft.net> <20080801.011122.32782916.davem@davemloft.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080801.011122.32782916.davem@davemloft.net> User-Agent: Mutt/1.5.18 (2008-05-17) X-ELTE-VirusStatus: clean X-ELTE-SpamScore: -1.5 X-ELTE-SpamLevel: X-ELTE-SpamCheck: no X-ELTE-SpamVersion: ELTE 2.0 X-ELTE-SpamCheck-Details: score=-1.5 required=5.9 tests=BAYES_00 autolearn=no SpamAssassin version=3.2.3 -1.5 BAYES_00 BODY: Bayesian spam probability is 0 to 1% [score: 0.0000] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * David Miller wrote: > From: David Miller > Date: Thu, 31 Jul 2008 15:55:04 -0700 (PDT) > > > I am absolutely sure, I spent the whole night yesterday trying to > > debug this. > > Followup. I lost two days of my life debugging this because seemingly > nobody can friggin' agree on what to do about the "printk() wakeup > issue". Thanks! > > Can we fix this now, please? > > The problem was that Peter's patch triggers a print_deadlock_bug() in > lockdep.c on the runqueue locks. > > But those printk()'s quickly want to do a wakeup, which wants to take > the runqueue lock this thread already holds. So I would only get the > first line of the lockdep debugging followed by a complete hang. ugh. In the context of lockdep you are the first one to trigger this bug - thanks for the fix! We had a few other incidents of printks generated by bugs within the scheduler code causing lockups (due to the wakeup) - and Steve Rostedt sent a more generic solution for that: to first trylock the runqueue lock in that case instead of doing an unconditional wakeup. The patch made it to linux-next but Andrew NAK-ed that patch because it caused other problems: it made printk wakeups conceptually less reliable. (a spurious lock taken from another CPU could prevent a printk wakeup from propagating and could block klogd indefinitely) > Doing these wakeups in such a BUG message is unwise. Please can we > apply something like the following and save other developers countless > wasted hours of their time? > > Thanks :-) applied to tip/core/locking. I'm wondering, does this mean that Peter's: lockdep: change scheduler annotation is still not good enough yet? Ingo