From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760002AbYEHNVP (ORCPT ); Thu, 8 May 2008 09:21:15 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1758940AbYEHNUx (ORCPT ); Thu, 8 May 2008 09:20:53 -0400 Received: from palinux.external.hp.com ([192.25.206.14]:48527 "EHLO mail.parisc-linux.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757577AbYEHNUv (ORCPT ); Thu, 8 May 2008 09:20:51 -0400 Date: Thu, 8 May 2008 07:20:49 -0600 From: Matthew Wilcox To: Ingo Molnar Cc: Linus Torvalds , "Zhang, Yanmin" , Andi Kleen , LKML , Alexander Viro , Andrew Morton , Thomas Gleixner , "H. Peter Anvin" Subject: Re: [patch] speed up / fix the new generic semaphore code (fix AIM7 40% regression with 2.6.26-rc1) Message-ID: <20080508132049.GG19219@parisc-linux.org> References: <87lk2mbcqp.fsf@basil.nowhere.org> <20080507114643.GR19219@parisc-linux.org> <87hcdab8zp.fsf@basil.nowhere.org> <1210214696.3453.87.camel@ymzhang> <1210219729.3453.97.camel@ymzhang> <20080508120130.GA2860@elte.hu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080508120130.GA2860@elte.hu> User-Agent: Mutt/1.5.13 (2006-08-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 08, 2008 at 02:01:30PM +0200, Ingo Molnar wrote: > Looking at the workload i found and fixed what i believe to be the real > bug causing the AIM7 regression: it was inefficient wakeup / scheduling > / locking behavior of the new generic semaphore code, causing suboptimal > performance. I did note that earlier downthread ... although to be fair, I thought of it in terms of three tasks with the third task coming in and stealing the second tasks's wakeup rather than the first task starving the second by repeatedly locking/unlocking the semaphore. > So if the old owner, even if just a few instructions later, does a > down() [lock_kernel()] again, it will be blocked and will have to wait > on the new owner to eventually be scheduled (possibly on another CPU)! > Or if another other task gets to lock_kernel() sooner than the "new > owner" scheduled, it will be blocked unnecessarily and for a very long > time when there are 2000 tasks running. > > I.e. the implementation of the new semaphores code does wake-one and > lock ownership in a very restrictive way - it does not allow > opportunistic re-locking of the lock at all and keeps the scheduler from > picking task order intelligently. Fair is certainly the enemy of throughput (see also dbench arguments passim). It may be that some semaphore users really do want fairness -- it seems pretty clear that we don't want fairness for the BKL. -- Intel are signing my paycheques ... these opinions are still mine "Bill, look, we understand that you're interested in selling us this operating system, but compare it to ours. We can't possibly take such a retrograde step."