From: Shiju Jose <shiju.jose@huawei.com>
Series adds,
1. ACPI based FW First error injection and
2. Support for injecting ARM processor errors.
This qemu based error injection mechanism found very useful for testing and
upstream the RAS FW-first related changes in the kernel
as well as in the user space tools when hardware is not available.
What is this?
- ACPI + UEFI specs define a means of notifying the OS of errors that
firmware has handled (gathered up data etc, reset the relevant error tracking
units etc) in a set of standard formats (UEFI spec appendix N).
- ARM virt already supports standard HEST ACPI table description of Synchronous
External Abort (SEA) for memory errors. This series builds on this to
add a GHESv2 / Generic Error Device / GPIO interrupt path for asynchronous
error reporting.
- The OS normally negotiates for control of error registers via _OSC.
Previously QEMU unconditionally granted control of these registers.
This series includes a machine parameter to allow the 'FW' to not let the
OS take control and tracks whether the OS has asked for control or not.
Note this code relies on the standard handshake - it's not remotely
correct if the OS does follow that flow - this can be hardened with some
more AML magic.
Alternatives:
- In theory we could emulate a management controller running appropriate firmware
and have that actually handle the errors. It's much easier to instead intercept
them before the error reporting messages are sent and result logged in the root
ports error registers. As far as the guest is concerned it doesn't matter if
these registers are handled via the firmware or never got written in the first
place (the guest isn't allowed to touch these registers anyway!)
This is sort of same argument for why we build ACPI tables in general in QEMU
rather than making that an EDK2 problem.
Why?
- The kernel supports both firmware first and native RAS.
As only some vendors have adopted a FW first model and hardware
availability is limited this code has proven challenging to test.
Why an RFC?
- Assuming adding this support to QEMU will be controversial.
- Probably figure out how to do this for x86 as apparently people
also want to use that architecture.
Reference to the previous series.
https://patchew.org/QEMU/20240205141940.31111-1-Jonathan.Cameron@huawei.com/
Mauro Carvalho had added instructions in wiki about how to inject ARM
procssor errors:
https://github.com/mchehab/rasdaemon/wiki/error-injection
Series is avaiable here:
https://gitlab.com/shiju.jose/qemu/-/commits/arm-error-inject
Jonathan Cameron (3):
arm/virt: Wire up GPIO error source for ACPI / GHES
acpi/ghes: Support GPIO error source.
acpi/ghes: Add a logic to handle block addresses and FW first ARM
processor error injection
configs/targets/aarch64-softmmu.mak | 1 +
hw/acpi/ghes.c | 266 ++++++++++++++++++++++++++--
hw/arm/Kconfig | 4 +
hw/arm/arm_error_inject.c | 35 ++++
hw/arm/arm_error_inject_stubs.c | 18 ++
hw/arm/meson.build | 3 +
hw/arm/virt-acpi-build.c | 29 ++-
hw/arm/virt.c | 12 +-
include/hw/acpi/ghes.h | 3 +
include/hw/boards.h | 1 +
qapi/arm-error-inject.json | 49 +++++
qapi/meson.build | 1 +
qapi/qapi-schema.json | 1 +
13 files changed, 405 insertions(+), 18 deletions(-)
create mode 100644 hw/arm/arm_error_inject.c
create mode 100644 hw/arm/arm_error_inject_stubs.c
create mode 100644 qapi/arm-error-inject.json
--
2.34.1