From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F16BFC47082 for ; Tue, 8 Jun 2021 18:42:16 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0B1CB61624 for ; Tue, 8 Jun 2021 18:42:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0B1CB61624 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=eldorado.org.br Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:46392 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lqgg2-0001W3-TX for qemu-devel@archiver.kernel.org; Tue, 08 Jun 2021 14:42:15 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:33636) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lqgdQ-0008Nk-Ms; Tue, 08 Jun 2021 14:39:32 -0400 Received: from [201.28.113.2] (port=8928 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lqgdO-0005m0-PK; Tue, 08 Jun 2021 14:39:32 -0400 Received: from power9a ([10.10.71.235]) by outlook.eldorado.org.br with Microsoft SMTPSVC(8.5.9600.16384); Tue, 8 Jun 2021 15:39:25 -0300 Received: from [127.0.0.1] (unknown [10.10.71.235]) by power9a (Postfix) with ESMTPS id 64A3880148E; Tue, 8 Jun 2021 15:39:25 -0300 (-03) Subject: Re: [RFC PATCH] target/ppc: fix address translation bug for hash table mmus From: Bruno Piazera Larsen To: Richard Henderson , qemu-devel@nongnu.org References: <20210602191822.90182-1-bruno.larsen@eldorado.org.br> <39c92ce9-46b8-4847-974c-647c7a5ca2ae@eldorado.org.br> <7198ccf1-f2db-2e39-3778-4083b5d7fa45@linaro.org> <258682ee-0ae6-cf59-e7bf-42879abcde5b@eldorado.org.br> Message-ID: <524da033-7d6a-5720-026a-04fbc01e40c3@eldorado.org.br> Date: Tue, 8 Jun 2021 15:39:25 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <258682ee-0ae6-cf59-e7bf-42879abcde5b@eldorado.org.br> Content-Type: multipart/alternative; boundary="------------64929ED9C705C5390AF173BD" Content-Language: en-US X-OriginalArrivalTime: 08 Jun 2021 18:39:25.0732 (UTC) FILETIME=[9B5F5E40:01D75C95] X-Host-Lookup-Failed: Reverse DNS lookup failed for 201.28.113.2 (failed) Received-SPF: pass client-ip=201.28.113.2; envelope-from=bruno.larsen@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: farosas@linux.ibm.com, luis.pires@eldorado.org.br, Greg Kurz , lucas.araujo@eldorado.org.br, fernando.valle@eldorado.org.br, qemu-ppc@nongnu.org, matheus.ferst@eldorado.org.br, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" This is a multi-part message in MIME format. --------------64929ED9C705C5390AF173BD Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit On 08/06/2021 13:37, Bruno Piazera Larsen wrote: > > > On 08/06/2021 12:35, Richard Henderson wrote: >> On 6/8/21 7:39 AM, Bruno Piazera Larsen wrote: >>>> That's odd.  We already have more arguments than the number of >>>> argument registers...  A 5x slowdown is distinctly odd. >>> I did some more digging and the problem is not with >>> ppc_radix64_check_prot, the problem is ppc_radix64_xlate, which >>> currently has 7 arguments and we're increasing to 8. 7 feels like >>> the correct number, but I couldn't find docs supporting it, so I >>> could be wrong. >> >> According to tcg/ppc/tcg-target.c.inc, there are 8 argument registers >> for ppc hosts.  But now I see you didn't actually say on which host >> you observed the problem...  It's 6 argument registers for x86_64 host. > > Oh, yes, sorry. I'm experiencing it in a POWER9 machine (ppc64le > architecture). According to tcg this shouldn't be the issue, then, so > idk if that's the real reason or not. All I know is that as soon as > gcc can't optimize an argument away it happens (fprintf in > radix64_xlate, using one of the mmuidx_* functions, defining those as > macros). > > I'll test it in my x86_64 machine and see if such a slowdown happens. > It's not conclusive evidence, but the function is too complex for me > to follow the disassembly if I can avoid it... > Test has been done: Slow down also happens on the x86_64 machine (but without change its already 360s, so idk if the slowdown is that dramatic), so it's _probably_ not going over the argument register count. I have no clue what could be. Still working on the struct version to see if anything changes. -- Bruno Piazera Larsen Instituto de Pesquisas ELDORADO Departamento Computação Embarcada Analista de Software Trainee Aviso Legal - Disclaimer --------------64929ED9C705C5390AF173BD Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit


On 08/06/2021 13:37, Bruno Piazera Larsen wrote:


On 08/06/2021 12:35, Richard Henderson wrote:
On 6/8/21 7:39 AM, Bruno Piazera Larsen wrote:
That's odd.  We already have more arguments than the number of argument registers...  A 5x slowdown is distinctly odd.
I did some more digging and the problem is not with ppc_radix64_check_prot, the problem is ppc_radix64_xlate, which currently has 7 arguments and we're increasing to 8. 7 feels like the correct number, but I couldn't find docs supporting it, so I could be wrong.

According to tcg/ppc/tcg-target.c.inc, there are 8 argument registers for ppc hosts.  But now I see you didn't actually say on which host you observed the problem...  It's 6 argument registers for x86_64 host.

Oh, yes, sorry. I'm experiencing it in a POWER9 machine (ppc64le architecture). According to tcg this shouldn't be the issue, then, so idk if that's the real reason or not. All I know is that as soon as gcc can't optimize an argument away it happens (fprintf in radix64_xlate, using one of the mmuidx_* functions, defining those as macros).

I'll test it in my x86_64 machine and see if such a slowdown happens. It's not conclusive evidence, but the function is too complex for me to follow the disassembly if I can avoid it...

Test has been done: Slow down also happens on the x86_64 machine (but without change its already 360s, so idk if the slowdown is that dramatic), so it's _probably_ not going over the argument register count. I have no clue what could be. Still working on the struct version to see if anything changes.

--
Bruno Piazera Larsen
Instituto de Pesquisas ELDORADO
Departamento Computação Embarcada
Analista de Software Trainee
Aviso Legal - Disclaimer
--------------64929ED9C705C5390AF173BD--