[RFC PATCH v2 0/3] hw/cxl: Poison get, inject, clear

Jonathan Cameron via posted 3 patches 1 year, 2 months ago
There is a newer version of this series
hw/cxl/cxl-mailbox-utils.c  | 199 ++++++++++++++++++++++++++++++++++++
hw/mem/cxl_type3.c          |  92 +++++++++++++++++
hw/mem/cxl_type3_stubs.c    |   3 +
hw/mem/meson.build          |   2 +
include/hw/cxl/cxl_device.h |  21 ++++
qapi/cxl.json               |  11 ++
6 files changed, 328 insertions(+)
[RFC PATCH v2 0/3] hw/cxl: Poison get, inject, clear
Posted by Jonathan Cameron via 1 year, 2 months ago
RFC mostly because we already have 4 CXL QEMU series waiting for review
ahead of this and I don't want to distract too much from those.

I posted an RFC of part of this series a long time back [1]. It's been more
or less entirely rewriten since then and gained support for mailbox based
injection and clearing (including writing the cacheline). Note that I still
don't check for poison on direct reads of memory. That may be added in future
but isn't really that useful for testing the kernel code - it will end up
being the same as injecting a Machine Check or equivalent for the Host
Physical Address in question.

The series supports:
1) Injection of variable length poison regions via QMP (to fake real
   memory corruption and ensure we deal with odd overflow corner cases
   such as clearing the middle of a large region making the list overflow
   as we go from one long entry to two smaller entries.
2) Read of poison list via the CXL mailbox.
3) Injection via the poison injection mailbox command (limited to 64 byte
   entries)
4) Clearing of poison injected via either method.

The implementation is meant to be a valid combination of imdef choices
based on what the spec allowed. There are a number of places where it could
be made more sophisticated that we might consider in future:
* Fusing adjacent poison entries if the types match.
* Separate injection list and main poison list, to test out limits on
  injected poison list being smaller than the main list.
* Poison list overflow event (needs event log support in general)
* Connecting up to the poison list error record generation (rather complex
  and not needed for currently kernel handling testing).

As the kernel code is currently fairly simple, it is likely that the above
does not yet matter but who knows what will turn up in future!

Tested against Alison's latest kernel patches.
https://lore.kernel.org/linux-cxl/cover.1674070170.git.alison.schofield@intel.com/
https://lore.kernel.org/linux-cxl/cover.1674101475.git.alison.schofield@intel.com/
and set timestamp patch
https://lore.kernel.org/linux-cxl/20230130151327.32415-1-Jonathan.Cameron@huawei.com/

[1] https://lore.kernel.org/linux-cxl/20220620162056.16790-1-Jonathan.Cameron@huawei.com/

Jonathan Cameron (3):
  hw/cxl: QMP based poison injection support
  hw/cxl: Add poison injection via the mailbox.
  hw/cxl: Add clear poison mailbox command support.

 hw/cxl/cxl-mailbox-utils.c  | 199 ++++++++++++++++++++++++++++++++++++
 hw/mem/cxl_type3.c          |  92 +++++++++++++++++
 hw/mem/cxl_type3_stubs.c    |   3 +
 hw/mem/meson.build          |   2 +
 include/hw/cxl/cxl_device.h |  21 ++++
 qapi/cxl.json               |  11 ++
 6 files changed, 328 insertions(+)

-- 
2.37.2