accel/tcg/translator.c question about translator_access

Sid Manning posted 1 patch 1 year, 1 month ago
accel/tcg/translator.c question about translator_access
Posted by Sid Manning 1 year, 1 month ago
There is an assert in translator_access that I hit while running on a version of QEMU integrated into a Virtual Platform.

Since this function can return null anyway I tried the following experiment:

--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -172,7 +172,9 @@ static void *translator_access(CPUArchState *env, DisasContextBase *db,
             tb_page_addr_t phys_page =
                 get_page_addr_code_hostp(env, base, &db->host_addr[1]);
             /* We cannot handle MMIO as second page. */
-            assert(phys_page != -1);
+            if(phys_page == -1) {
+                return NULL;
+            }
             tb_set_page_addr1(tb, phys_page);
#ifdef CONFIG_USER_ONLY
             page_protect(end);

which avoided the issue and the test ran to completion.  What is this assert trying to catch?
Re: accel/tcg/translator.c question about translator_access
Posted by Richard Henderson 1 year, 1 month ago
On 1/31/23 17:06, Sid Manning wrote:
> There is an assert in translator_access that I hit while running on a version of QEMU 
> integrated into a Virtual Platform.
> 
> Since this function can return null anyway I tried the following experiment:
...
> -            assert(phys_page != -1);
> +            if(phys_page == -1) {
> +                return NULL;
> +            }
...
> which avoided the issue and the test ran to completion.  What is this assert trying to catch?


One half of the instruction in ram and the other half of the instruction in mmio.

If the entire instruction is in mmio, then we correctly translate, but do not cache the 
result (since the io can produce different results on every access).  But if we have 
started caching the result, because we start in ram, then we will incorrectly cache the 
mmio access.

This really should never happen.  How did it occur?


r~

RE: accel/tcg/translator.c question about translator_access
Posted by Sid Manning 1 year, 1 month ago

> -----Original Message-----
> From: Richard Henderson <richard.henderson@linaro.org>
> Sent: Tuesday, January 31, 2023 11:46 PM
> To: Sid Manning <sidneym@quicinc.com>; qemu-devel@nongnu.org
> Cc: Mark Burton <mburton@qti.qualcomm.com>; Brian Cain
> <bcain@quicinc.com>; Matheus Bernardino
> <mathbern@qti.qualcomm.com>
> Subject: Re: accel/tcg/translator.c question about translator_access
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary
> of any links or attachments, and do not enable macros.
> 
> On 1/31/23 17:06, Sid Manning wrote:
> > There is an assert in translator_access that I hit while running on a
> > version of QEMU integrated into a Virtual Platform.
> >
> > Since this function can return null anyway I tried the following experiment:
> ...
> > -            assert(phys_page != -1);
> > +            if(phys_page == -1) {
> > +                return NULL;
> > +            }
> ...
> > which avoided the issue and the test ran to completion.  What is this assert
> trying to catch?
> 
> 
> One half of the instruction in ram and the other half of the instruction in
> mmio.
> 
> If the entire instruction is in mmio, then we correctly translate, but do not
> cache the result (since the io can produce different results on every access).
> But if we have started caching the result, because we start in ram, then we
> will incorrectly cache the mmio access.
> 
> This really should never happen.  How did it occur?

This might be a synchronization problem with System-C, a packet is straddling a page boundary.  Software running on the ARM is dispatching code to run on the DSP.  I have only seen this when the cores are interacting in this way.

PS: Sorry for the delayed response, I was unexpectedly out of the office last week due to an ice storm and power outage.

> 
> 
> r~