[RFC PATCH] target/ppc: fix address translation bug for hash table mmus

Bruno Larsen (billionai) posted 1 patch 2 years, 10 months ago
Failed in applying to current master (apply log)
target/ppc/mmu-hash64.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
[RFC PATCH] target/ppc: fix address translation bug for hash table mmus
Posted by Bruno Larsen (billionai) 2 years, 10 months ago
Based-on: <20210518201146.794854-1-richard.henderson@linaro.org>

This commit attempts to implement a first draft of a solution to the
first bug mentioned by Richard Henderson in this e-mail
https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06247.html
The second bug was not touched, which is basically implementing the
solution C

To sumarize the first bug here, from my understanding, when an address
translation is asked of a 64bit mmu that uses hashtables, the code
attempts to check some permission bits, but checks them from the wrong
location.

The solution implemented here is more complex than necessary on
purpose, to make it more readable (and make sure I understand what is
going on). If that would really fix the problem, I'll move to
implementing an actual solution, and to all affected functions.

Signed-off-by: Bruno Larsen (billionai) <bruno.larsen@eldorado.org.br>
---
 target/ppc/mmu-hash64.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
index c1b98a97e9..63f10f1be7 100644
--- a/target/ppc/mmu-hash64.c
+++ b/target/ppc/mmu-hash64.c
@@ -887,6 +887,14 @@ bool ppc_hash64_xlate(PowerPCCPU *cpu, vaddr eaddr, MMUAccessType access_type,
     int exec_prot, pp_prot, amr_prot, prot;
     int need_prot;
     hwaddr raddr;
+    unsigned immu_idx, dmmu_idx;
+    immu_idx = (env->hflags >> HFLAGS_IMMU_IDX) & 7;
+    dmmu_idx = (env->hflags >> HFLAGS_DMMU_IDX) & 7;
+    const short HV = 1, IR = 2, DR = 3;
+    bool MSR[3];
+    MSR[HV] = dmmu_idx & 2,
+    MSR[IR] = immu_idx & 4,
+    MSR[DR] = dmmu_idx & 4;
 
     /*
      * Note on LPCR usage: 970 uses HID4, but our special variant of
@@ -897,7 +905,7 @@ bool ppc_hash64_xlate(PowerPCCPU *cpu, vaddr eaddr, MMUAccessType access_type,
      */
 
     /* 1. Handle real mode accesses */
-    if (access_type == MMU_INST_FETCH ? !msr_ir : !msr_dr) {
+    if (access_type == MMU_INST_FETCH ? !MSR[IR] : !MSR[DR]) {
         /*
          * Translation is supposedly "off", but in real mode the top 4
          * effective address bits are (mostly) ignored
@@ -909,7 +917,7 @@ bool ppc_hash64_xlate(PowerPCCPU *cpu, vaddr eaddr, MMUAccessType access_type,
              * In virtual hypervisor mode, there's nothing to do:
              *   EA == GPA == qemu guest address
              */
-        } else if (msr_hv || !env->has_hv_mode) {
+        } else if (MSR[HV] || !env->has_hv_mode) {
             /* In HV mode, add HRMOR if top EA bit is clear */
             if (!(eaddr >> 63)) {
                 raddr |= env->spr[SPR_HRMOR];
-- 
2.17.1


Re: [RFC PATCH] target/ppc: fix address translation bug for hash table mmus
Posted by Richard Henderson 2 years, 10 months ago
On 6/2/21 12:18 PM, Bruno Larsen (billionai) wrote:
> Based-on: <20210518201146.794854-1-richard.henderson@linaro.org>
> 
> This commit attempts to implement a first draft of a solution to the
> first bug mentioned by Richard Henderson in this e-mail
> https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06247.html
> The second bug was not touched, which is basically implementing the
> solution C
> 
> To sumarize the first bug here, from my understanding, when an address
> translation is asked of a 64bit mmu that uses hashtables, the code
> attempts to check some permission bits, but checks them from the wrong
> location.
> 
> The solution implemented here is more complex than necessary on
> purpose, to make it more readable (and make sure I understand what is
> going on). If that would really fix the problem, I'll move to
> implementing an actual solution, and to all affected functions.
> 
> Signed-off-by: Bruno Larsen (billionai) <bruno.larsen@eldorado.org.br>
> ---
>   target/ppc/mmu-hash64.c | 12 ++++++++++--
>   1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
> index c1b98a97e9..63f10f1be7 100644
> --- a/target/ppc/mmu-hash64.c
> +++ b/target/ppc/mmu-hash64.c
> @@ -887,6 +887,14 @@ bool ppc_hash64_xlate(PowerPCCPU *cpu, vaddr eaddr, MMUAccessType access_type,
>       int exec_prot, pp_prot, amr_prot, prot;
>       int need_prot;
>       hwaddr raddr;
> +    unsigned immu_idx, dmmu_idx;
> +    immu_idx = (env->hflags >> HFLAGS_IMMU_IDX) & 7;
> +    dmmu_idx = (env->hflags >> HFLAGS_DMMU_IDX) & 7;

This doesn't help at all with the reported bug. You're still reading from env. 
You need the mmu_idx that was passed to ppc_cpu_tlb_fill.

For the use from ppc_cpu_get_phys_page_debug, you'd pass in cpu_mmu_index(env, 
false).


> +    const short HV = 1, IR = 2, DR = 3;
> +    bool MSR[3];
> +    MSR[HV] = dmmu_idx & 2,
> +    MSR[IR] = immu_idx & 4,
> +    MSR[DR] = dmmu_idx & 4;

There's no point in the array.  Just use three different scalars (real_mode, 
hv, and pr (note that pr is the major portion of the bug as reported)). 
Additionally, you'll not be distinguishing immu_idx and dmmu_idx, but using the 
single idx that's given.

> -    if (access_type == MMU_INST_FETCH ? !msr_ir : !msr_dr) {
> +    if (access_type == MMU_INST_FETCH ? !MSR[IR] : !MSR[DR]) {

Which simplifies this condition to just a single test.


r~

Re: [RFC PATCH] target/ppc: fix address translation bug for hash table mmus
Posted by Bruno Piazera Larsen 2 years, 10 months ago
On 02/06/2021 16:26, Richard Henderson wrote:
> On 6/2/21 12:18 PM, Bruno Larsen (billionai) wrote:
>> Based-on: <20210518201146.794854-1-richard.henderson@linaro.org>
>>
>> This commit attempts to implement a first draft of a solution to the
>> first bug mentioned by Richard Henderson in this e-mail
>> https://lists.nongnu.org/archive/html/qemu-devel/2021-05/msg06247.html
>> The second bug was not touched, which is basically implementing the
>> solution C
>>
>> To sumarize the first bug here, from my understanding, when an address
>> translation is asked of a 64bit mmu that uses hashtables, the code
>> attempts to check some permission bits, but checks them from the wrong
>> location.
>>
>> The solution implemented here is more complex than necessary on
>> purpose, to make it more readable (and make sure I understand what is
>> going on). If that would really fix the problem, I'll move to
>> implementing an actual solution, and to all affected functions.
>>
>> Signed-off-by: Bruno Larsen (billionai) <bruno.larsen@eldorado.org.br>
>> ---
>>   target/ppc/mmu-hash64.c | 12 ++++++++++--
>>   1 file changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/target/ppc/mmu-hash64.c b/target/ppc/mmu-hash64.c
>> index c1b98a97e9..63f10f1be7 100644
>> --- a/target/ppc/mmu-hash64.c
>> +++ b/target/ppc/mmu-hash64.c
>> @@ -887,6 +887,14 @@ bool ppc_hash64_xlate(PowerPCCPU *cpu, vaddr 
>> eaddr, MMUAccessType access_type,
>>       int exec_prot, pp_prot, amr_prot, prot;
>>       int need_prot;
>>       hwaddr raddr;
>> +    unsigned immu_idx, dmmu_idx;
>> +    immu_idx = (env->hflags >> HFLAGS_IMMU_IDX) & 7;
>> +    dmmu_idx = (env->hflags >> HFLAGS_DMMU_IDX) & 7;
>
> This doesn't help at all with the reported bug. You're still reading 
> from env. You need the mmu_idx that was passed to ppc_cpu_tlb_fill.
Ah, I saw a macro for MMU_IDX and assumed they pointed to different 
locations. Ok, the fix for ppc_cpu_tlb_fill should be easy, then
>
> For the use from ppc_cpu_get_phys_page_debug, you'd pass in 
> cpu_mmu_index(env, false).

ppc_cpu_get_phys_page_debug has 2 calls to ppc_xlate, one using the data 
MMU, the other using the instruction MMU. I'm guessing I should pass 
both, right?

But here we have another bit that confuses me: cpu_mmu_index returns 0 
if in user mode, or uses the information stored in env to get it, so I 
don't see how that would be different from getting directly. Unless the 
point is to have ppc_*_xlate be generic and pc_*_debug knows the info in 
env is correct. Is that it?

>
>
>> +    const short HV = 1, IR = 2, DR = 3;
>> +    bool MSR[3];
>> +    MSR[HV] = dmmu_idx & 2,
>> +    MSR[IR] = immu_idx & 4,
>> +    MSR[DR] = dmmu_idx & 4;
>
> There's no point in the array.  Just use three different scalars 
> (real_mode, hv, and pr (note that pr is the major portion of the bug 
> as reported)). Additionally, you'll not be distinguishing immu_idx and 
> dmmu_idx, but using the single idx that's given.

Ah, yeah, that's the "more complex than necessary, but it was easy for 
me to read" part. Scalars are a good solution. In this function in 
specific, PR doesn't actually show up anywhere, so I would actually only 
need 2. Anyway, will start working on this.

>
>> -    if (access_type == MMU_INST_FETCH ? !msr_ir : !msr_dr) {
>> +    if (access_type == MMU_INST_FETCH ? !MSR[IR] : !MSR[DR]) {
>
> Which simplifies this condition to just a single test.
>
>
> r~
-- 
Bruno Piazera Larsen
Instituto de Pesquisas ELDORADO 
<https://www.eldorado.org.br/?utm_campaign=assinatura_de_e-mail&utm_medium=email&utm_source=RD+Station>
Departamento Computação Embarcada
Analista de Software Trainee
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>
Re: [RFC PATCH] target/ppc: fix address translation bug for hash table mmus
Posted by Richard Henderson 2 years, 10 months ago
On 6/2/21 12:58 PM, Bruno Piazera Larsen wrote:
>> For the use from ppc_cpu_get_phys_page_debug, you'd pass in 
>> cpu_mmu_index(env, false).
> 
> ppc_cpu_get_phys_page_debug has 2 calls to ppc_xlate, one using the data MMU, 
> the other using the instruction MMU. I'm guessing I should pass both, right?

Yes.

> But here we have another bit that confuses me: cpu_mmu_index returns 0 if in 
> user mode, or uses the information stored in env to get it, so I don't see how 
> that would be different from getting directly. Unless the point is to have 
> ppc_*_xlate be generic and pc_*_debug knows the info in env is correct. Is that it?

The issue is that

(1) ppc_*_xlate should perform the lookup requested, and mmu_idx
     does not *necessarily* correspond to the current contents of
     env->msr et al.  See (2).

(2) There is a secondary call to ppc_radix64_partition_scoped_xlate
     for which the second stage page table should be read
     with hypervisor permissions, and not the permissions of the
     original memory access.

     Note that ppc_radix64_check_prot checks msr_pr directly.

     Thus the second stage lookup should use mmu_idx = 5
     (HV kernel virtual mode).  If I understand things correctly...

> 
>>
>>
>>> +    const short HV = 1, IR = 2, DR = 3;
>>> +    bool MSR[3];
>>> +    MSR[HV] = dmmu_idx & 2,
>>> +    MSR[IR] = immu_idx & 4,
>>> +    MSR[DR] = dmmu_idx & 4;
>>
>> There's no point in the array.  Just use three different scalars (real_mode, 
>> hv, and pr (note that pr is the major portion of the bug as reported)). 
>> Additionally, you'll not be distinguishing immu_idx and dmmu_idx, but using 
>> the single idx that's given.
> 
> Ah, yeah, that's the "more complex than necessary, but it was easy for me to 
> read" part. Scalars are a good solution. In this function in specific, PR 
> doesn't actually show up anywhere, so I would actually only need 2. Anyway, 
> will start working on this.

Oh, I'll note that your constants above are wrong.  I think that you should 
have some common routines in (mmu-)internal.h:

/*
  * These correspond to the mmu_idx values computed in
  * hreg_compute_hflags_value.  See the tables therein.
  */
static inline bool mmuidx_pr(int idx) { return idx & 1; }
static inline bool mmuidx_real(int idx) { return idx & 2; }
static inline bool mmuidx_hv(int idx) { return idx & 4; }

because you'll want to use these past mmu-radix64.c.

Then you also have a single place to adjust if the mmu_idx are reordered at a 
later date.


r~

Re: [RFC PATCH] target/ppc: fix address translation bug for hash table mmus
Posted by Bruno Piazera Larsen 2 years, 9 months ago
On 02/06/2021 19:19, Richard Henderson wrote:
> On 6/2/21 12:58 PM, Bruno Piazera Larsen wrote:
>>> For the use from ppc_cpu_get_phys_page_debug, you'd pass in 
>>> cpu_mmu_index(env, false).
>>
>> ppc_cpu_get_phys_page_debug has 2 calls to ppc_xlate, one using the 
>> data MMU, the other using the instruction MMU. I'm guessing I should 
>> pass both, right?
>
> Yes.
>
>> But here we have another bit that confuses me: cpu_mmu_index returns 
>> 0 if in user mode, or uses the information stored in env to get it, 
>> so I don't see how that would be different from getting directly. 
>> Unless the point is to have ppc_*_xlate be generic and pc_*_debug 
>> knows the info in env is correct. Is that it?
>
> The issue is that
>
> (1) ppc_*_xlate should perform the lookup requested, and mmu_idx
>     does not *necessarily* correspond to the current contents of
>     env->msr et al.  See (2).
>
> (2) There is a secondary call to ppc_radix64_partition_scoped_xlate
>     for which the second stage page table should be read
>     with hypervisor permissions, and not the permissions of the
>     original memory access.
>
>     Note that ppc_radix64_check_prot checks msr_pr directly.
>
>     Thus the second stage lookup should use mmu_idx = 5
>     (HV kernel virtual mode).  If I understand things correctly...
>
>>
>>>
>>>
>>>> +    const short HV = 1, IR = 2, DR = 3;
>>>> +    bool MSR[3];
>>>> +    MSR[HV] = dmmu_idx & 2,
>>>> +    MSR[IR] = immu_idx & 4,
>>>> +    MSR[DR] = dmmu_idx & 4;
>>>
>>> There's no point in the array.  Just use three different scalars 
>>> (real_mode, hv, and pr (note that pr is the major portion of the bug 
>>> as reported)). Additionally, you'll not be distinguishing immu_idx 
>>> and dmmu_idx, but using the single idx that's given.
>>
>> Ah, yeah, that's the "more complex than necessary, but it was easy 
>> for me to read" part. Scalars are a good solution. In this function 
>> in specific, PR doesn't actually show up anywhere, so I would 
>> actually only need 2. Anyway, will start working on this.
>
> Oh, I'll note that your constants above are wrong.  I think that you 
> should have some common routines in (mmu-)internal.h:
>
> /*
>  * These correspond to the mmu_idx values computed in
>  * hreg_compute_hflags_value.  See the tables therein.
>  */
> static inline bool mmuidx_pr(int idx) { return idx & 1; }
> static inline bool mmuidx_real(int idx) { return idx & 2; }
> static inline bool mmuidx_hv(int idx) { return idx & 4; }
>
> because you'll want to use these past mmu-radix64.c.
>
> Then you also have a single place to adjust if the mmu_idx are 
> reordered at a later date.
>
>
> r~

I just tried sending mmu_idx all the way down, but I ran into a very 
weird bug of gcc. If we have to add one more parameter that GCC can't 
just optimize away we get at least a slow down of 5x for the first test 
of check-acceptance (could be more, but the test times out after 900 
seconds, so I'm not sure). One way that I managed to get around that is 
saving the current MSR, setting it to 5, and restoring after the xlate 
call. The code ended up something like:

int new_idx = (5<<HFLAGS_IMMU_IDX) | (5<<HFLAGS_DMMU_IDX);
int clr = (7<<HFLAGS_IMMU_IDX) | (7<<HFLAGS_DMMU_IDX);
int old_idx = env->msr & clr;
clr = ~clr;
/* set new msr so we don't need to send the mmu_idx */
env->msr = (env->msr & clr) | new_idx;
ret = ppc_radix64_partition_scoped_xlate(...);
/* restore old mmu_idx */
env->msr = (env->msr & clr) | old_idx;

That way we continue to use the env as the way to send the variable and 
avoid what I think is a problem with the compiler's optimization.

Would this be acceptable (with proper documentation in the form of 
comments, ofc) or do we have to find a better solution that doesn't 
touch the values of env? I personally don't like it, but I couldn't find 
a better solution. If you're fine with it, we can use it, otherwise I'll 
keep looking.

-- 
Bruno Piazera Larsen
Instituto de Pesquisas ELDORADO 
<https://www.eldorado.org.br/?utm_campaign=assinatura_de_e-mail&utm_medium=email&utm_source=RD+Station>
Departamento Computação Embarcada
Analista de Software Trainee
Aviso Legal - Disclaimer If
<https://www.eldorado.org.br/disclaimer.html>
Re: [RFC PATCH] target/ppc: fix address translation bug for hash table mmus
Posted by Richard Henderson 2 years, 9 months ago
On 6/7/21 12:29 PM, Bruno Piazera Larsen wrote:
> I just tried sending mmu_idx all the way down, but I ran into a very weird bug 
> of gcc. If we have to add one more parameter that GCC can't just optimize away 
> we get at least a slow down of 5x for the first test of check-acceptance (could 
> be more, but the test times out after 900 seconds, so I'm not sure).

That's odd.  We already have more arguments than the number of argument 
registers...  A 5x slowdown is distinctly odd.


> One way 
> that I managed to get around that is saving the current MSR, setting it to 5, 
> and restoring after the xlate call. The code ended up something like:
> 
> int new_idx = (5<<HFLAGS_IMMU_IDX) | (5<<HFLAGS_DMMU_IDX);
> int clr = (7<<HFLAGS_IMMU_IDX) | (7<<HFLAGS_DMMU_IDX);
> int old_idx = env->msr & clr;
> clr = ~clr;
> /* set new msr so we don't need to send the mmu_idx */
> env->msr = (env->msr & clr) | new_idx;
> ret = ppc_radix64_partition_scoped_xlate(...);
> /* restore old mmu_idx */
> env->msr = (env->msr & clr) | old_idx;

No, this is silly.

We need to do one of two things:
   - make sure everything is inlined,
   - reduce the number of arguments.

We're currently passing in 9 arguments, which really is too many already.  We 
should be using something akin to mmu_ctx_t, but probably specific to radix64 
without the random stuff collected for random other mmu models.


r~

Re: [RFC PATCH] target/ppc: fix address translation bug for hash table mmus
Posted by Bruno Piazera Larsen 2 years, 9 months ago
On 07/06/2021 18:06, Richard Henderson wrote:
> On 6/7/21 12:29 PM, Bruno Piazera Larsen wrote:
>> I just tried sending mmu_idx all the way down, but I ran into a very 
>> weird bug of gcc. If we have to add one more parameter that GCC can't 
>> just optimize away we get at least a slow down of 5x for the first 
>> test of check-acceptance (could be more, but the test times out after 
>> 900 seconds, so I'm not sure).
>
> That's odd.  We already have more arguments than the number of 
> argument registers...  A 5x slowdown is distinctly odd.
I did some more digging and the problem is not with 
ppc_radix64_check_prot, the problem is ppc_radix64_xlate, which 
currently has 7 arguments and we're increasing to 8. 7 feels like the 
correct number, but I couldn't find docs supporting it, so I could be 
wrong.
>
>
>> One way that I managed to get around that is saving the current MSR, 
>> setting it to 5, and restoring after the xlate call. The code ended 
>> up something like:
>>
>> int new_idx = (5<<HFLAGS_IMMU_IDX) | (5<<HFLAGS_DMMU_IDX);
>> int clr = (7<<HFLAGS_IMMU_IDX) | (7<<HFLAGS_DMMU_IDX);
>> int old_idx = env->msr & clr;
>> clr = ~clr;
>> /* set new msr so we don't need to send the mmu_idx */
>> env->msr = (env->msr & clr) | new_idx;
>> ret = ppc_radix64_partition_scoped_xlate(...);
>> /* restore old mmu_idx */
>> env->msr = (env->msr & clr) | old_idx;
>
> No, this is silly.
>
> We need to do one of two things:
>   - make sure everything is inlined,
>   - reduce the number of arguments.
>
> We're currently passing in 9 arguments, which really is too many 
> already.  We should be using something akin to mmu_ctx_t, but probably 
> specific to radix64 without the random stuff collected for random 
> other mmu models.

That means we'd have to define radix_ctx_t (or however we call it) in 
radix64.h, setup the struct on ppc_xlate, then pass it to ppc_radix64_xlate.

 From looking at the code, it seems the most useful bits to put in the 
struct are: eaddr, g_addr, h_addr, {h,g}_prot, {g,h}_page_size, mmu_idx 
and guest_visible. They all seem reasonable to me, but I might be 
missing something again.

>
>
> r~
Another question: I know hash mmus don't have this problem, but since 
ppc_jumbo_xlate also uses mmu_idx, should we make those xlate user 
mmu_idxs as well? I tested and it doesn't make a time difference.
-- 
Bruno Piazera Larsen
Instituto de Pesquisas ELDORADO 
<https://www.eldorado.org.br/?utm_campaign=assinatura_de_e-mail&utm_medium=email&utm_source=RD+Station>
Departamento Computação Embarcada
Analista de Software Trainee
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>
Re: [RFC PATCH] target/ppc: fix address translation bug for hash table mmus
Posted by Richard Henderson 2 years, 9 months ago
On 6/8/21 7:39 AM, Bruno Piazera Larsen wrote:
>> That's odd.  We already have more arguments than the number of argument 
>> registers...  A 5x slowdown is distinctly odd.
> I did some more digging and the problem is not with ppc_radix64_check_prot, the 
> problem is ppc_radix64_xlate, which currently has 7 arguments and we're 
> increasing to 8. 7 feels like the correct number, but I couldn't find docs 
> supporting it, so I could be wrong.

According to tcg/ppc/tcg-target.c.inc, there are 8 argument registers for ppc 
hosts.  But now I see you didn't actually say on which host you observed the 
problem...  It's 6 argument registers for x86_64 host.

> That means we'd have to define radix_ctx_t (or however we call it) in 
> radix64.h, setup the struct on ppc_xlate, then pass it to ppc_radix64_xlate.

Well, if you're going to change the xlate interface, you want to do that across 
all of them.  So, not call it radix_ctx_t.

>  From looking at the code, it seems the most useful bits to put in the struct 
> are: eaddr, g_addr, h_addr, {h,g}_prot, {g,h}_page_size, mmu_idx and 
> guest_visible. They all seem reasonable to me, but I might be missing something 
> again.

I don't think h/g should be in this struct.  I think h/g should use different 
struct instances, because they are different accesses.


r~

Re: [RFC PATCH] target/ppc: fix address translation bug for hash table mmus
Posted by Bruno Piazera Larsen 2 years, 9 months ago
On 08/06/2021 12:35, Richard Henderson wrote:
> On 6/8/21 7:39 AM, Bruno Piazera Larsen wrote:
>>> That's odd.  We already have more arguments than the number of 
>>> argument registers...  A 5x slowdown is distinctly odd.
>> I did some more digging and the problem is not with 
>> ppc_radix64_check_prot, the problem is ppc_radix64_xlate, which 
>> currently has 7 arguments and we're increasing to 8. 7 feels like the 
>> correct number, but I couldn't find docs supporting it, so I could be 
>> wrong.
>
> According to tcg/ppc/tcg-target.c.inc, there are 8 argument registers 
> for ppc hosts.  But now I see you didn't actually say on which host 
> you observed the problem...  It's 6 argument registers for x86_64 host.

Oh, yes, sorry. I'm experiencing it in a POWER9 machine (ppc64le 
architecture). According to tcg this shouldn't be the issue, then, so 
idk if that's the real reason or not. All I know is that as soon as gcc 
can't optimize an argument away it happens (fprintf in radix64_xlate, 
using one of the mmuidx_* functions, defining those as macros).

I'll test it in my x86_64 machine and see if such a slowdown happens. 
It's not conclusive evidence, but the function is too complex for me to 
follow the disassembly if I can avoid it...

>
>> That means we'd have to define radix_ctx_t (or however we call it) in 
>> radix64.h, setup the struct on ppc_xlate, then pass it to 
>> ppc_radix64_xlate.
>
> Well, if you're going to change the xlate interface, you want to do 
> that across all of them.  So, not call it radix_ctx_t.
I wouldn't change ppc_xlate's interface, I'd set up the struct in that 
function and call ppc_radix64_xlate using the struct
>
>>  From looking at the code, it seems the most useful bits to put in 
>> the struct are: eaddr, g_addr, h_addr, {h,g}_prot, {g,h}_page_size, 
>> mmu_idx and guest_visible. They all seem reasonable to me, but I 
>> might be missing something again.
>
> I don't think h/g should be in this struct.  I think h/g should use 
> different struct instances, because they are different accesses.
Ok, makes sense
>
>
> r~
-- 
Bruno Piazera Larsen
Instituto de Pesquisas ELDORADO 
<https://www.eldorado.org.br/?utm_campaign=assinatura_de_e-mail&utm_medium=email&utm_source=RD+Station>
Departamento Computação Embarcada
Analista de Software Trainee
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>
Re: [RFC PATCH] target/ppc: fix address translation bug for hash table mmus
Posted by Bruno Piazera Larsen 2 years, 9 months ago
On 08/06/2021 13:37, Bruno Piazera Larsen wrote:
>
>
> On 08/06/2021 12:35, Richard Henderson wrote:
>> On 6/8/21 7:39 AM, Bruno Piazera Larsen wrote:
>>>> That's odd.  We already have more arguments than the number of 
>>>> argument registers...  A 5x slowdown is distinctly odd.
>>> I did some more digging and the problem is not with 
>>> ppc_radix64_check_prot, the problem is ppc_radix64_xlate, which 
>>> currently has 7 arguments and we're increasing to 8. 7 feels like 
>>> the correct number, but I couldn't find docs supporting it, so I 
>>> could be wrong.
>>
>> According to tcg/ppc/tcg-target.c.inc, there are 8 argument registers 
>> for ppc hosts.  But now I see you didn't actually say on which host 
>> you observed the problem...  It's 6 argument registers for x86_64 host.
>
> Oh, yes, sorry. I'm experiencing it in a POWER9 machine (ppc64le 
> architecture). According to tcg this shouldn't be the issue, then, so 
> idk if that's the real reason or not. All I know is that as soon as 
> gcc can't optimize an argument away it happens (fprintf in 
> radix64_xlate, using one of the mmuidx_* functions, defining those as 
> macros).
>
> I'll test it in my x86_64 machine and see if such a slowdown happens. 
> It's not conclusive evidence, but the function is too complex for me 
> to follow the disassembly if I can avoid it...
>
Test has been done: Slow down also happens on the x86_64 machine (but 
without change its already 360s, so idk if the slowdown is that 
dramatic), so it's _probably_ not going over the argument register 
count. I have no clue what could be. Still working on the struct version 
to see if anything changes.

-- 
Bruno Piazera Larsen
Instituto de Pesquisas ELDORADO 
<https://www.eldorado.org.br/?utm_campaign=assinatura_de_e-mail&utm_medium=email&utm_source=RD+Station>
Departamento Computação Embarcada
Analista de Software Trainee
Aviso Legal - Disclaimer <https://www.eldorado.org.br/disclaimer.html>