From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2EF71C47082 for ; Mon, 7 Jun 2021 19:32:30 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 73864610E7 for ; Mon, 7 Jun 2021 19:32:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 73864610E7 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=eldorado.org.br Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:54644 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lqKz5-00027x-Up for qemu-devel@archiver.kernel.org; Mon, 07 Jun 2021 15:32:28 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58636) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lqKw7-0000C7-FS; Mon, 07 Jun 2021 15:29:23 -0400 Received: from [201.28.113.2] (port=41930 helo=outlook.eldorado.org.br) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lqKw5-00008j-2I; Mon, 07 Jun 2021 15:29:23 -0400 Received: from power9a ([10.10.71.235]) by outlook.eldorado.org.br with Microsoft SMTPSVC(8.5.9600.16384); Mon, 7 Jun 2021 16:29:15 -0300 Received: from [127.0.0.1] (unknown [10.10.71.235]) by power9a (Postfix) with ESMTPS id 67CD1801387; Mon, 7 Jun 2021 16:29:15 -0300 (-03) Subject: Re: [RFC PATCH] target/ppc: fix address translation bug for hash table mmus To: Richard Henderson , qemu-devel@nongnu.org References: <20210602191822.90182-1-bruno.larsen@eldorado.org.br> <39c92ce9-46b8-4847-974c-647c7a5ca2ae@eldorado.org.br> From: Bruno Piazera Larsen Message-ID: Date: Mon, 7 Jun 2021 16:29:15 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------61C86F1951D8AD48B965B624" Content-Language: en-US X-OriginalArrivalTime: 07 Jun 2021 19:29:15.0773 (UTC) FILETIME=[6729AAD0:01D75BD3] X-Host-Lookup-Failed: Reverse DNS lookup failed for 201.28.113.2 (failed) Received-SPF: pass client-ip=201.28.113.2; envelope-from=bruno.larsen@eldorado.org.br; helo=outlook.eldorado.org.br X-Spam_score_int: -10 X-Spam_score: -1.1 X-Spam_bar: - X-Spam_report: (-1.1 / 5.0 requ) BAYES_00=-1.9, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: farosas@linux.ibm.com, luis.pires@eldorado.org.br, Greg Kurz , lucas.araujo@eldorado.org.br, fernando.valle@eldorado.org.br, qemu-ppc@nongnu.org, matheus.ferst@eldorado.org.br, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" This is a multi-part message in MIME format. --------------61C86F1951D8AD48B965B624 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit On 02/06/2021 19:19, Richard Henderson wrote: > On 6/2/21 12:58 PM, Bruno Piazera Larsen wrote: >>> For the use from ppc_cpu_get_phys_page_debug, you'd pass in >>> cpu_mmu_index(env, false). >> >> ppc_cpu_get_phys_page_debug has 2 calls to ppc_xlate, one using the >> data MMU, the other using the instruction MMU. I'm guessing I should >> pass both, right? > > Yes. > >> But here we have another bit that confuses me: cpu_mmu_index returns >> 0 if in user mode, or uses the information stored in env to get it, >> so I don't see how that would be different from getting directly. >> Unless the point is to have ppc_*_xlate be generic and pc_*_debug >> knows the info in env is correct. Is that it? > > The issue is that > > (1) ppc_*_xlate should perform the lookup requested, and mmu_idx >     does not *necessarily* correspond to the current contents of >     env->msr et al.  See (2). > > (2) There is a secondary call to ppc_radix64_partition_scoped_xlate >     for which the second stage page table should be read >     with hypervisor permissions, and not the permissions of the >     original memory access. > >     Note that ppc_radix64_check_prot checks msr_pr directly. > >     Thus the second stage lookup should use mmu_idx = 5 >     (HV kernel virtual mode).  If I understand things correctly... > >> >>> >>> >>>> +    const short HV = 1, IR = 2, DR = 3; >>>> +    bool MSR[3]; >>>> +    MSR[HV] = dmmu_idx & 2, >>>> +    MSR[IR] = immu_idx & 4, >>>> +    MSR[DR] = dmmu_idx & 4; >>> >>> There's no point in the array.  Just use three different scalars >>> (real_mode, hv, and pr (note that pr is the major portion of the bug >>> as reported)). Additionally, you'll not be distinguishing immu_idx >>> and dmmu_idx, but using the single idx that's given. >> >> Ah, yeah, that's the "more complex than necessary, but it was easy >> for me to read" part. Scalars are a good solution. In this function >> in specific, PR doesn't actually show up anywhere, so I would >> actually only need 2. Anyway, will start working on this. > > Oh, I'll note that your constants above are wrong.  I think that you > should have some common routines in (mmu-)internal.h: > > /* >  * These correspond to the mmu_idx values computed in >  * hreg_compute_hflags_value.  See the tables therein. >  */ > static inline bool mmuidx_pr(int idx) { return idx & 1; } > static inline bool mmuidx_real(int idx) { return idx & 2; } > static inline bool mmuidx_hv(int idx) { return idx & 4; } > > because you'll want to use these past mmu-radix64.c. > > Then you also have a single place to adjust if the mmu_idx are > reordered at a later date. > > > r~ I just tried sending mmu_idx all the way down, but I ran into a very weird bug of gcc. If we have to add one more parameter that GCC can't just optimize away we get at least a slow down of 5x for the first test of check-acceptance (could be more, but the test times out after 900 seconds, so I'm not sure). One way that I managed to get around that is saving the current MSR, setting it to 5, and restoring after the xlate call. The code ended up something like: int new_idx = (5<msr & clr; clr = ~clr; /* set new msr so we don't need to send the mmu_idx */ env->msr = (env->msr & clr) | new_idx; ret = ppc_radix64_partition_scoped_xlate(...); /* restore old mmu_idx */ env->msr = (env->msr & clr) | old_idx; That way we continue to use the env as the way to send the variable and avoid what I think is a problem with the compiler's optimization. Would this be acceptable (with proper documentation in the form of comments, ofc) or do we have to find a better solution that doesn't touch the values of env? I personally don't like it, but I couldn't find a better solution. If you're fine with it, we can use it, otherwise I'll keep looking. -- Bruno Piazera Larsen Instituto de Pesquisas ELDORADO Departamento Computação Embarcada Analista de Software Trainee Aviso Legal - Disclaimer If --------------61C86F1951D8AD48B965B624 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit


On 02/06/2021 19:19, Richard Henderson wrote:
On 6/2/21 12:58 PM, Bruno Piazera Larsen wrote:
For the use from ppc_cpu_get_phys_page_debug, you'd pass in cpu_mmu_index(env, false).

ppc_cpu_get_phys_page_debug has 2 calls to ppc_xlate, one using the data MMU, the other using the instruction MMU. I'm guessing I should pass both, right?

Yes.

But here we have another bit that confuses me: cpu_mmu_index returns 0 if in user mode, or uses the information stored in env to get it, so I don't see how that would be different from getting directly. Unless the point is to have ppc_*_xlate be generic and pc_*_debug knows the info in env is correct. Is that it?

The issue is that

(1) ppc_*_xlate should perform the lookup requested, and mmu_idx
    does not *necessarily* correspond to the current contents of
    env->msr et al.  See (2).

(2) There is a secondary call to ppc_radix64_partition_scoped_xlate
    for which the second stage page table should be read
    with hypervisor permissions, and not the permissions of the
    original memory access.

    Note that ppc_radix64_check_prot checks msr_pr directly.

    Thus the second stage lookup should use mmu_idx = 5
    (HV kernel virtual mode).  If I understand things correctly...




+    const short HV = 1, IR = 2, DR = 3;
+    bool MSR[3];
+    MSR[HV] = dmmu_idx & 2,
+    MSR[IR] = immu_idx & 4,
+    MSR[DR] = dmmu_idx & 4;

There's no point in the array.  Just use three different scalars (real_mode, hv, and pr (note that pr is the major portion of the bug as reported)). Additionally, you'll not be distinguishing immu_idx and dmmu_idx, but using the single idx that's given.

Ah, yeah, that's the "more complex than necessary, but it was easy for me to read" part. Scalars are a good solution. In this function in specific, PR doesn't actually show up anywhere, so I would actually only need 2. Anyway, will start working on this.

Oh, I'll note that your constants above are wrong.  I think that you should have some common routines in (mmu-)internal.h:

/*
 * These correspond to the mmu_idx values computed in
 * hreg_compute_hflags_value.  See the tables therein.
 */
static inline bool mmuidx_pr(int idx) { return idx & 1; }
static inline bool mmuidx_real(int idx) { return idx & 2; }
static inline bool mmuidx_hv(int idx) { return idx & 4; }

because you'll want to use these past mmu-radix64.c.

Then you also have a single place to adjust if the mmu_idx are reordered at a later date.


r~

I just tried sending mmu_idx all the way down, but I ran into a very weird bug of gcc. If we have to add one more parameter that GCC can't just optimize away we get at least a slow down of 5x for the first test of check-acceptance (could be more, but the test times out after 900 seconds, so I'm not sure). One way that I managed to get around that is saving the current MSR, setting it to 5, and restoring after the xlate call. The code ended up something like:

int new_idx = (5<<HFLAGS_IMMU_IDX) | (5<<HFLAGS_DMMU_IDX);
int clr = (7<<HFLAGS_IMMU_IDX) | (7<<HFLAGS_DMMU_IDX);
int old_idx = env->msr & clr;
clr = ~clr;
/* set new msr so we don't need to send the mmu_idx */
env->msr = (env->msr & clr) | new_idx;
ret = ppc_radix64_partition_scoped_xlate(...);
/* restore old mmu_idx */
env->msr = (env->msr & clr) | old_idx;

That way we continue to use the env as the way to send the variable and avoid what I think is a problem with the compiler's optimization.

Would this be acceptable (with proper documentation in the form of comments, ofc) or do we have to find a better solution that doesn't touch the values of env? I personally don't like it, but I couldn't find a better solution. If you're fine with it, we can use it, otherwise I'll keep looking.

--
Bruno Piazera Larsen
Instituto de Pesquisas ELDORADO
Departamento Computação Embarcada
Analista de Software Trainee
Aviso Legal - Disclaimer If
--------------61C86F1951D8AD48B965B624--