include/exec/cpu_ldst.h | 40 ++++++++++++++++ accel/tcg/user-exec.c | 22 +++++++++ target/arm/tcg/helper-a64.c | 10 ++-- tests/tcg/multiarch/memset-fault.c | 77 ++++++++++++++++++++++++++++++ 4 files changed, 144 insertions(+), 5 deletions(-) create mode 100644 tests/tcg/multiarch/memset-fault.c
While looking into Zoltan's attempt to speed up ppc64 DCBZ (data cache block set to zero), I wondered what AArch64 was doing differently. It turned out that Arm is the only user of tlb_vaddr_to_host. None of the code sequences in use between AArch64, Power64 and S390X are 100% safe, with race conditions vs mmap et al, however, AArch64 is the only one that will fail this single threaded test case. Use of these new functions fixes the race condition as well, though I have not yet touched the other guests. I thought about exposing accel/tcg/user-retaddr.h for direct use from the targets, but perhaps these wrappers are cleaner. RFC? r~ Richard Henderson (2): accel/tcg: Introduce memset_ra, memmove_ra target/arm: Use memset_ra, memmove_ra in helper-a64.c include/exec/cpu_ldst.h | 40 ++++++++++++++++ accel/tcg/user-exec.c | 22 +++++++++ target/arm/tcg/helper-a64.c | 10 ++-- tests/tcg/multiarch/memset-fault.c | 77 ++++++++++++++++++++++++++++++ 4 files changed, 144 insertions(+), 5 deletions(-) create mode 100644 tests/tcg/multiarch/memset-fault.c -- 2.34.1
On Wed, 3 Jul 2024 at 00:43, Richard Henderson <richard.henderson@linaro.org> wrote: > > While looking into Zoltan's attempt to speed up ppc64 DCBZ > (data cache block set to zero), I wondered what AArch64 was > doing differently. It turned out that Arm is the only user > of tlb_vaddr_to_host. riscv also seems to use it in vext_ldff(), fwiw. -- PMM
On 7/8/24 07:25, Peter Maydell wrote: > On Wed, 3 Jul 2024 at 00:43, Richard Henderson > <richard.henderson@linaro.org> wrote: >> >> While looking into Zoltan's attempt to speed up ppc64 DCBZ >> (data cache block set to zero), I wondered what AArch64 was >> doing differently. It turned out that Arm is the only user >> of tlb_vaddr_to_host. > > riscv also seems to use it in vext_ldff(), fwiw. So it does, followed by a second probe for read. That's quite wrong... But you're right that it has a similar race condition. r~
On Tue, 2024-07-02 at 16:41 -0700, Richard Henderson wrote: > While looking into Zoltan's attempt to speed up ppc64 DCBZ > (data cache block set to zero), I wondered what AArch64 was > doing differently. It turned out that Arm is the only user > of tlb_vaddr_to_host. > > None of the code sequences in use between AArch64, Power64 and S390X > are 100% safe, with race conditions vs mmap et al, however, AArch64 > is the only one that will fail this single threaded test case. Use > of these new functions fixes the race condition as well, though I > have not yet touched the other guests. > > I thought about exposing accel/tcg/user-retaddr.h for direct use > from the targets, but perhaps these wrappers are cleaner. RFC? > > > r~ > > > Richard Henderson (2): > accel/tcg: Introduce memset_ra, memmove_ra > target/arm: Use memset_ra, memmove_ra in helper-a64.c > > include/exec/cpu_ldst.h | 40 ++++++++++++++++ > accel/tcg/user-exec.c | 22 +++++++++ > target/arm/tcg/helper-a64.c | 10 ++-- > tests/tcg/multiarch/memset-fault.c | 77 > ++++++++++++++++++++++++++++++ > 4 files changed, 144 insertions(+), 5 deletions(-) > create mode 100644 tests/tcg/multiarch/memset-fault.c This sounds good to me. I haven't debugged it, but I wonder why doesn't s390x fail here. For XC with src == dst, it does access_memset() -> do_access_memset() -> memset() without setting the RA. And I don't think that anything around it sets the RA either.
On 7/4/24 07:50, Ilya Leoshkevich wrote: > On Tue, 2024-07-02 at 16:41 -0700, Richard Henderson wrote: >> While looking into Zoltan's attempt to speed up ppc64 DCBZ >> (data cache block set to zero), I wondered what AArch64 was >> doing differently. It turned out that Arm is the only user >> of tlb_vaddr_to_host. >> >> None of the code sequences in use between AArch64, Power64 and S390X >> are 100% safe, with race conditions vs mmap et al, however, AArch64 >> is the only one that will fail this single threaded test case. Use >> of these new functions fixes the race condition as well, though I >> have not yet touched the other guests. >> >> I thought about exposing accel/tcg/user-retaddr.h for direct use >> from the targets, but perhaps these wrappers are cleaner. RFC? >> >> >> r~ >> >> >> Richard Henderson (2): >> accel/tcg: Introduce memset_ra, memmove_ra >> target/arm: Use memset_ra, memmove_ra in helper-a64.c >> >> include/exec/cpu_ldst.h | 40 ++++++++++++++++ >> accel/tcg/user-exec.c | 22 +++++++++ >> target/arm/tcg/helper-a64.c | 10 ++-- >> tests/tcg/multiarch/memset-fault.c | 77 >> ++++++++++++++++++++++++++++++ >> 4 files changed, 144 insertions(+), 5 deletions(-) >> create mode 100644 tests/tcg/multiarch/memset-fault.c > > This sounds good to me. > > I haven't debugged it, but I wonder why doesn't s390x fail here. > For XC with src == dst, it does access_memset() -> do_access_memset() > -> memset() without setting the RA. And I don't think that anything > around it sets the RA either. s390x uses probe_access_flags, which verifies the page is mapped and writable, and raises the exception when it isn't. In contrast, for user-only, tlb_vaddr_to_host *only* performs the guest -> host address mapping, i.e. (addr + guest_base). r~
On 7/4/24 08:18, Richard Henderson wrote: > On 7/4/24 07:50, Ilya Leoshkevich wrote: >> On Tue, 2024-07-02 at 16:41 -0700, Richard Henderson wrote: >>> While looking into Zoltan's attempt to speed up ppc64 DCBZ >>> (data cache block set to zero), I wondered what AArch64 was >>> doing differently. It turned out that Arm is the only user >>> of tlb_vaddr_to_host. >>> >>> None of the code sequences in use between AArch64, Power64 and S390X >>> are 100% safe, with race conditions vs mmap et al, however, AArch64 >>> is the only one that will fail this single threaded test case. Use >>> of these new functions fixes the race condition as well, though I >>> have not yet touched the other guests. >>> >>> I thought about exposing accel/tcg/user-retaddr.h for direct use >>> from the targets, but perhaps these wrappers are cleaner. RFC? >>> >>> >>> r~ >>> >>> >>> Richard Henderson (2): >>> accel/tcg: Introduce memset_ra, memmove_ra >>> target/arm: Use memset_ra, memmove_ra in helper-a64.c >>> >>> include/exec/cpu_ldst.h | 40 ++++++++++++++++ >>> accel/tcg/user-exec.c | 22 +++++++++ >>> target/arm/tcg/helper-a64.c | 10 ++-- >>> tests/tcg/multiarch/memset-fault.c | 77 >>> ++++++++++++++++++++++++++++++ >>> 4 files changed, 144 insertions(+), 5 deletions(-) >>> create mode 100644 tests/tcg/multiarch/memset-fault.c >> >> This sounds good to me. >> >> I haven't debugged it, but I wonder why doesn't s390x fail here. >> For XC with src == dst, it does access_memset() -> do_access_memset() >> -> memset() without setting the RA. And I don't think that anything >> around it sets the RA either. > > s390x uses probe_access_flags, which verifies the page is mapped and writable, and raises > the exception when it isn't. In contrast, for user-only, tlb_vaddr_to_host *only* > performs the guest -> host address mapping, i.e. (addr + guest_base). I should clarify: probe_access_flags verifies that the page is mapped *at that moment*, but does not take the mmap_lock. So the race is that the page can be unmapped by another thread after probe_access_flags and before the memset completes. r~
On Thu, 2024-07-04 at 14:48 -0700, Richard Henderson wrote: > On 7/4/24 08:18, Richard Henderson wrote: > > On 7/4/24 07:50, Ilya Leoshkevich wrote: > > > On Tue, 2024-07-02 at 16:41 -0700, Richard Henderson wrote: > > > > While looking into Zoltan's attempt to speed up ppc64 DCBZ > > > > (data cache block set to zero), I wondered what AArch64 was > > > > doing differently. It turned out that Arm is the only user > > > > of tlb_vaddr_to_host. > > > > > > > > None of the code sequences in use between AArch64, Power64 and > > > > S390X > > > > are 100% safe, with race conditions vs mmap et al, however, > > > > AArch64 > > > > is the only one that will fail this single threaded test case. > > > > Use > > > > of these new functions fixes the race condition as well, though > > > > I > > > > have not yet touched the other guests. > > > > > > > > I thought about exposing accel/tcg/user-retaddr.h for direct > > > > use > > > > from the targets, but perhaps these wrappers are cleaner. RFC? > > > > > > > > > > > > r~ > > > > > > > > > > > > Richard Henderson (2): > > > > accel/tcg: Introduce memset_ra, memmove_ra > > > > target/arm: Use memset_ra, memmove_ra in helper-a64.c > > > > > > > > include/exec/cpu_ldst.h | 40 ++++++++++++++++ > > > > accel/tcg/user-exec.c | 22 +++++++++ > > > > target/arm/tcg/helper-a64.c | 10 ++-- > > > > tests/tcg/multiarch/memset-fault.c | 77 > > > > ++++++++++++++++++++++++++++++ > > > > 4 files changed, 144 insertions(+), 5 deletions(-) > > > > create mode 100644 tests/tcg/multiarch/memset-fault.c > > > > > > This sounds good to me. > > > > > > I haven't debugged it, but I wonder why doesn't s390x fail here. > > > For XC with src == dst, it does access_memset() -> > > > do_access_memset() > > > -> memset() without setting the RA. And I don't think that > > > anything > > > around it sets the RA either. > > > > s390x uses probe_access_flags, which verifies the page is mapped > > and writable, and raises > > the exception when it isn't. In contrast, for user-only, > > tlb_vaddr_to_host *only* > > performs the guest -> host address mapping, i.e. (addr + > > guest_base). > > I should clarify: probe_access_flags verifies that the page is mapped > *at that moment*, > but does not take the mmap_lock. So the race is that the page can be > unmapped by another > thread after probe_access_flags and before the memset completes. I see, thanks. I completely overlooked the access_prepare() calls. Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
© 2016 - 2024 Red Hat, Inc.