[PATCH v6 0/8] VSX MMA Implementation

Lucas Mateus Castro(alqotel) posted 8 patches 1 year, 11 months ago
linux-user/elfload.c                |   4 +
target/ppc/cpu.h                    |  13 ++
target/ppc/fpu_helper.c             | 329 +++++++++++++++++++++++++++-
target/ppc/helper.h                 |  33 +++
target/ppc/insn32.decode            |  52 +++++
target/ppc/insn64.decode            |  79 +++++++
target/ppc/int_helper.c             | 130 +++++++++++
target/ppc/internal.h               |  15 ++
target/ppc/translate/vsx-impl.c.inc | 130 +++++++++++
9 files changed, 783 insertions(+), 2 deletions(-)
[PATCH v6 0/8] VSX MMA Implementation
Posted by Lucas Mateus Castro(alqotel) 1 year, 11 months ago
From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>

Based-on: https://gitlab.com/danielhb/qemu/-/tree/ppc-next

This patch series is a patch series of the Matrix-Multiply Assist (MMA)
instructions implementation from the PowerISA 3.1

This patch series was created based on Victor's target/ppc: Fix FPSCR.FI
bit patch series changes as that series changed do_check_float_status,
which is called by the GER helper functions.

These and the VDIV/VMOD implementation are the last new PowerISA 3.1
instructions left to be implemented.

The XVFGER instructions accumulate the exception status and at the end
set the FPSCR and take a Program interrupt on a trap-enabled exception,
previous versions were based on Victor's rework of FPU exceptions, but
as that patch was rejected this version worked around the fact that
OX/UX/XX and invalid instructions were handled in different functions
by disabling all enable bits then re-enabling them and calling the mtfsf
deferred exception helper.

v6 changes:
    - Rebased on ppc-next
    - Wrapped lines to stay <= 80 characters

v5 changes:
    - Changed VSXGER16 accumulation to negate the multiplication and
      accumulation in independent if's (if necessary) and sum their
      values.

v4 changes:
    - Changed VSXGER16 accumulation to always use float32_sum and negate
      the elements according to the type of accumulation

v3 changes:
    - GER helpers now use ppc_acc_t instead of ppc_vsr_t for passing acc
    - Removed do_ger_XX3 and updated the decodetree to pass the masks in
      32 bits instructions
    - Removed unnecessary rounding mode function
    - Moved float32_neg to fpu_helper.c and renamed it bfp32_negate to
      make it clearer that it's a 32 bit version of the PowerISA
      bfp_NEGATE
    - Negated accumulation now a subtraction
    - Changed exception handling by disabling all enable FPSCR enable
      bits to set all FPSCR bits (except FEX) correctly, then re-enable
      them and call do_fpscr_check_status to raise the exception
      accordingly and set FEX if necessary

v2 changes:
    - Changed VSXGER, VSXGER16 and XVIGER macros to functions
    - Set rounding mode in floating-point instructions based on RN
      before operations
    - Separated accumulate and with saturation instructions in
      different helpers
    - Used FIELD, FIELD_EX32 and FIELD_DP32 for packing/unpacking masks


Joel Stanley (1):
  linux-user: Add PowerPC ISA 3.1 and MMA to hwcap

Lucas Mateus Castro (alqotel) (7):
  target/ppc: Implement xxm[tf]acc and xxsetaccz
  target/ppc: Implemented xvi*ger* instructions
  target/ppc: Implemented pmxvi*ger* instructions
  target/ppc: Implemented xvf*ger*
  target/ppc: Implemented xvf16ger*
  target/ppc: Implemented pmxvf*ger*
  target/ppc: Implemented [pm]xvbf16ger2*

 linux-user/elfload.c                |   4 +
 target/ppc/cpu.h                    |  13 ++
 target/ppc/fpu_helper.c             | 329 +++++++++++++++++++++++++++-
 target/ppc/helper.h                 |  33 +++
 target/ppc/insn32.decode            |  52 +++++
 target/ppc/insn64.decode            |  79 +++++++
 target/ppc/int_helper.c             | 130 +++++++++++
 target/ppc/internal.h               |  15 ++
 target/ppc/translate/vsx-impl.c.inc | 130 +++++++++++
 9 files changed, 783 insertions(+), 2 deletions(-)

-- 
2.31.1
Re: [PATCH v6 0/8] VSX MMA Implementation
Posted by Daniel Henrique Barboza 1 year, 11 months ago
Queued in gitlab.com/danielhb/qemu/tree/ppc-next. Thanks,


Daniel

On 5/24/22 11:05, Lucas Mateus Castro(alqotel) wrote:
> From: "Lucas Mateus Castro (alqotel)" <lucas.araujo@eldorado.org.br>
> 
> Based-on: https://gitlab.com/danielhb/qemu/-/tree/ppc-next
> 
> This patch series is a patch series of the Matrix-Multiply Assist (MMA)
> instructions implementation from the PowerISA 3.1
> 
> This patch series was created based on Victor's target/ppc: Fix FPSCR.FI
> bit patch series changes as that series changed do_check_float_status,
> which is called by the GER helper functions.
> 
> These and the VDIV/VMOD implementation are the last new PowerISA 3.1
> instructions left to be implemented.
> 
> The XVFGER instructions accumulate the exception status and at the end
> set the FPSCR and take a Program interrupt on a trap-enabled exception,
> previous versions were based on Victor's rework of FPU exceptions, but
> as that patch was rejected this version worked around the fact that
> OX/UX/XX and invalid instructions were handled in different functions
> by disabling all enable bits then re-enabling them and calling the mtfsf
> deferred exception helper.
> 
> v6 changes:
>      - Rebased on ppc-next
>      - Wrapped lines to stay <= 80 characters
> 
> v5 changes:
>      - Changed VSXGER16 accumulation to negate the multiplication and
>        accumulation in independent if's (if necessary) and sum their
>        values.
> 
> v4 changes:
>      - Changed VSXGER16 accumulation to always use float32_sum and negate
>        the elements according to the type of accumulation
> 
> v3 changes:
>      - GER helpers now use ppc_acc_t instead of ppc_vsr_t for passing acc
>      - Removed do_ger_XX3 and updated the decodetree to pass the masks in
>        32 bits instructions
>      - Removed unnecessary rounding mode function
>      - Moved float32_neg to fpu_helper.c and renamed it bfp32_negate to
>        make it clearer that it's a 32 bit version of the PowerISA
>        bfp_NEGATE
>      - Negated accumulation now a subtraction
>      - Changed exception handling by disabling all enable FPSCR enable
>        bits to set all FPSCR bits (except FEX) correctly, then re-enable
>        them and call do_fpscr_check_status to raise the exception
>        accordingly and set FEX if necessary
> 
> v2 changes:
>      - Changed VSXGER, VSXGER16 and XVIGER macros to functions
>      - Set rounding mode in floating-point instructions based on RN
>        before operations
>      - Separated accumulate and with saturation instructions in
>        different helpers
>      - Used FIELD, FIELD_EX32 and FIELD_DP32 for packing/unpacking masks
> 
> 
> Joel Stanley (1):
>    linux-user: Add PowerPC ISA 3.1 and MMA to hwcap
> 
> Lucas Mateus Castro (alqotel) (7):
>    target/ppc: Implement xxm[tf]acc and xxsetaccz
>    target/ppc: Implemented xvi*ger* instructions
>    target/ppc: Implemented pmxvi*ger* instructions
>    target/ppc: Implemented xvf*ger*
>    target/ppc: Implemented xvf16ger*
>    target/ppc: Implemented pmxvf*ger*
>    target/ppc: Implemented [pm]xvbf16ger2*
> 
>   linux-user/elfload.c                |   4 +
>   target/ppc/cpu.h                    |  13 ++
>   target/ppc/fpu_helper.c             | 329 +++++++++++++++++++++++++++-
>   target/ppc/helper.h                 |  33 +++
>   target/ppc/insn32.decode            |  52 +++++
>   target/ppc/insn64.decode            |  79 +++++++
>   target/ppc/int_helper.c             | 130 +++++++++++
>   target/ppc/internal.h               |  15 ++
>   target/ppc/translate/vsx-impl.c.inc | 130 +++++++++++
>   9 files changed, 783 insertions(+), 2 deletions(-)
>