[PATCH v2 0/6] hw/cxl: Poison get, inject, clear

Jonathan Cameron via posted 6 patches 1 year, 1 month ago
There is a newer version of this series
hw/cxl/cxl-device-utils.c   |  15 ++
hw/cxl/cxl-mailbox-utils.c  | 285 ++++++++++++++++++++++++++++++------
hw/mem/cxl_type3.c          |  92 ++++++++++++
hw/mem/cxl_type3_stubs.c    |   6 +
include/hw/cxl/cxl_device.h |  23 +++
include/qemu/bswap.h        |  23 +++
qapi/cxl.json               |  18 +++
7 files changed, 420 insertions(+), 42 deletions(-)
[PATCH v2 0/6] hw/cxl: Poison get, inject, clear
Posted by Jonathan Cameron via 1 year, 1 month ago
v2: Thanks to Ira for review and also to Philippe as some of the
    changes follow through from comments on precusor series.

- Fixed a bunch of endian issues. Note that QEMU CXL suppport only currently
  supports platforms that happen to be little endian so these are more
  theoretical than bugs that can be triggered.
- Improve handling over mailbox inject poison that overlaps with
  qmp injected (which can be bigger).
- Tighter checks on alignment.
- Add 'Since' entries to qapi docs.
- Drop the CXLRetCode move out of this series as it isn't needed for this.
  Will appear in next series I post instead (Ira's event series)
- Drag down the st24_le_p() patch from Ira's Event series so we can use
  it in this series.

Note Alison has stated the kernel series will be post 6.3 material
so this one isn't quite as urgent as the patches it is based on.
However I think this series in a good state (plus I have lots more queued
behind it) hence promoting it from RFC.

Changes since RFC v2: Thanks to Markus for review.
 - Improve documentation for QMP interface
 - Add better description of baseline series
 - Include precursor refactors around ret_code / CXLRetCode as this is now
   the first series in suggeste merge order to rely on those.
 - Include Ira's cxl_device_get_timestamp() function as it was better than
   the equivalent in the RFC.

Based on following series (in order)
1. [PATCH v4 00/10] hw/cxl: CXL emulation cleanups and minor fixes for upstream
2. [PATCH v6 0/8] hw/cxl: RAS error emulation and injection
3. [PATCH v2 0/2] hw/cxl: Passthrough HDM decoder emulation
4. [PATCH v4 0/2] hw/mem: CXL Type-3 Volatile Memory Support

Based on: Message-Id: 20230206172816.8201-1-Jonathan.Cameron@huawei.com
Based-on: Message-id: 20230227112751.6101-1-Jonathan.Cameron@huawei.com
Based-on: Message-id: 20230227153128.8164-1-Jonathan.Cameron@huawei.com
Based-on: Message-id: 20230227163157.6621-1-Jonathan.Cameron@huawei.com

The series supports:
1) Injection of variable length poison regions via QMP (to fake real
   memory corruption and ensure we deal with odd overflow corner cases
   such as clearing the middle of a large region making the list overflow
   as we go from one long entry to two smaller entries.
2) Read of poison list via the CXL mailbox.
3) Injection via the poison injection mailbox command (limited to 64 byte
   entries)
4) Clearing of poison injected via either method.

The implementation is meant to be a valid combination of impdef choices
based on what the spec allowed. There are a number of places where it could
be made more sophisticated that we might consider in future:
* Fusing adjacent poison entries if the types match.
* Separate injection list and main poison list, to test out limits on
  injected poison list being smaller than the main list.
* Poison list overflow event (needs event log support in general)
* Connecting up to the poison list error record generation (rather complex
  and not needed for currently kernel handling testing).

As the kernel code is currently fairly simple, it is likely that the above
does not yet matter but who knows what will turn up in future!

Kernel patches:
 [PATCH v7 0/6] CXL Poison List Retrieval & Tracing
 cover.1676685180.git.alison.schofield@intel.com
 [PATCH v2 0/6] cxl: CXL Inject & Clear Poison
 cover.1674101475.git.alison.schofield@intel.com


Ira Weiny (2):
  hw/cxl: Introduce cxl_device_get_timestamp() utility function
  bswap: Add the ability to store to an unaligned 24 bit field

Jonathan Cameron (4):
  hw/cxl: rename mailbox return code type from ret_code to CXLRetCode
  hw/cxl: QMP based poison injection support
  hw/cxl: Add poison injection via the mailbox.
  hw/cxl: Add clear poison mailbox command support.

 hw/cxl/cxl-device-utils.c   |  15 ++
 hw/cxl/cxl-mailbox-utils.c  | 285 ++++++++++++++++++++++++++++++------
 hw/mem/cxl_type3.c          |  92 ++++++++++++
 hw/mem/cxl_type3_stubs.c    |   6 +
 include/hw/cxl/cxl_device.h |  23 +++
 include/qemu/bswap.h        |  23 +++
 qapi/cxl.json               |  18 +++
 7 files changed, 420 insertions(+), 42 deletions(-)

-- 
2.37.2
Re: [PATCH v2 0/6] hw/cxl: Poison get, inject, clear
Posted by Alison Schofield 1 year, 1 month ago
On Mon, Feb 27, 2023 at 05:03:05PM +0000, Jonathan Cameron wrote:

Hi Jonathan,
Can you share your repo with this support?  How about your qemu cmdline?
I'm more of a 'try it out' type of a reviewer for qemu changes.
Thanks,
Alison

> v2: Thanks to Ira for review and also to Philippe as some of the
>     changes follow through from comments on precusor series.
> 
> - Fixed a bunch of endian issues. Note that QEMU CXL suppport only currently
>   supports platforms that happen to be little endian so these are more
>   theoretical than bugs that can be triggered.
> - Improve handling over mailbox inject poison that overlaps with
>   qmp injected (which can be bigger).
> - Tighter checks on alignment.
> - Add 'Since' entries to qapi docs.
> - Drop the CXLRetCode move out of this series as it isn't needed for this.
>   Will appear in next series I post instead (Ira's event series)
> - Drag down the st24_le_p() patch from Ira's Event series so we can use
>   it in this series.
> 
> Note Alison has stated the kernel series will be post 6.3 material
> so this one isn't quite as urgent as the patches it is based on.
> However I think this series in a good state (plus I have lots more queued
> behind it) hence promoting it from RFC.
> 
> Changes since RFC v2: Thanks to Markus for review.
>  - Improve documentation for QMP interface
>  - Add better description of baseline series
>  - Include precursor refactors around ret_code / CXLRetCode as this is now
>    the first series in suggeste merge order to rely on those.
>  - Include Ira's cxl_device_get_timestamp() function as it was better than
>    the equivalent in the RFC.
> 
> Based on following series (in order)
> 1. [PATCH v4 00/10] hw/cxl: CXL emulation cleanups and minor fixes for upstream
> 2. [PATCH v6 0/8] hw/cxl: RAS error emulation and injection
> 3. [PATCH v2 0/2] hw/cxl: Passthrough HDM decoder emulation
> 4. [PATCH v4 0/2] hw/mem: CXL Type-3 Volatile Memory Support
> 
> Based on: Message-Id: 20230206172816.8201-1-Jonathan.Cameron@huawei.com
> Based-on: Message-id: 20230227112751.6101-1-Jonathan.Cameron@huawei.com
> Based-on: Message-id: 20230227153128.8164-1-Jonathan.Cameron@huawei.com
> Based-on: Message-id: 20230227163157.6621-1-Jonathan.Cameron@huawei.com
> 
> The series supports:
> 1) Injection of variable length poison regions via QMP (to fake real
>    memory corruption and ensure we deal with odd overflow corner cases
>    such as clearing the middle of a large region making the list overflow
>    as we go from one long entry to two smaller entries.
> 2) Read of poison list via the CXL mailbox.
> 3) Injection via the poison injection mailbox command (limited to 64 byte
>    entries)
> 4) Clearing of poison injected via either method.
> 
> The implementation is meant to be a valid combination of impdef choices
> based on what the spec allowed. There are a number of places where it could
> be made more sophisticated that we might consider in future:
> * Fusing adjacent poison entries if the types match.
> * Separate injection list and main poison list, to test out limits on
>   injected poison list being smaller than the main list.
> * Poison list overflow event (needs event log support in general)
> * Connecting up to the poison list error record generation (rather complex
>   and not needed for currently kernel handling testing).
> 
> As the kernel code is currently fairly simple, it is likely that the above
> does not yet matter but who knows what will turn up in future!
> 
> Kernel patches:
>  [PATCH v7 0/6] CXL Poison List Retrieval & Tracing
>  cover.1676685180.git.alison.schofield@intel.com
>  [PATCH v2 0/6] cxl: CXL Inject & Clear Poison
>  cover.1674101475.git.alison.schofield@intel.com
> 
> 
> Ira Weiny (2):
>   hw/cxl: Introduce cxl_device_get_timestamp() utility function
>   bswap: Add the ability to store to an unaligned 24 bit field
> 
> Jonathan Cameron (4):
>   hw/cxl: rename mailbox return code type from ret_code to CXLRetCode
>   hw/cxl: QMP based poison injection support
>   hw/cxl: Add poison injection via the mailbox.
>   hw/cxl: Add clear poison mailbox command support.
> 
>  hw/cxl/cxl-device-utils.c   |  15 ++
>  hw/cxl/cxl-mailbox-utils.c  | 285 ++++++++++++++++++++++++++++++------
>  hw/mem/cxl_type3.c          |  92 ++++++++++++
>  hw/mem/cxl_type3_stubs.c    |   6 +
>  include/hw/cxl/cxl_device.h |  23 +++
>  include/qemu/bswap.h        |  23 +++
>  qapi/cxl.json               |  18 +++
>  7 files changed, 420 insertions(+), 42 deletions(-)
> 
> -- 
> 2.37.2
>
Re: [PATCH v2 0/6] hw/cxl: Poison get, inject, clear
Posted by Jonathan Cameron via 1 year, 1 month ago
On Wed, 1 Mar 2023 17:15:56 -0800
Alison Schofield <alison.schofield@intel.com> wrote:

> On Mon, Feb 27, 2023 at 05:03:05PM +0000, Jonathan Cameron wrote:
> 
> Hi Jonathan,
> Can you share your repo with this support?  How about your qemu cmdline?
> I'm more of a 'try it out' type of a reviewer for qemu changes.

https://gitlab.com/jic23/qemu/-/tree/cxl-2023-02-28
is latest tree with this on.

A completely non minimal command line I'm using for a single device on that
tree is:
 qemu-system-aarch64 -M virt,nvdimm=on,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 -cpu max -smp 4 \
 -kernel Image \
 -drive if=none,file=full.qcow2,format=qcow2,id=hd \
 -device pcie-root-port,id=root_port1 -device virtio-blk-pci,drive=hd \
 -netdev type=user,id=mynet,hostfwd=tcp::5555-:22 \
 -qmp tcp:localhost:4445,server=on,wait=off \
 -device virtio-net-pci,netdev=mynet,id=bob \
 -nographic -no-reboot -append 'earlycon root=/dev/vda2 fsck.mode=skip maxcpus=4 tp_printk' \
 -monitor telnet:127.0.0.1:1234,server,nowait -bios QEMU_EFI.fd \
 -object memory-backend-ram,size=4G,id=mem0 \
 -numa node,nodeid=0,cpus=0-3,memdev=mem0 \
 -object memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/t4_p.raw,size=1G,align=1G \
 -object memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/t4_v.raw,size=1G,align=1G \
 -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/t4_plsa.raw,size=1M,align=1M \
 -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1,hdm_for_passthrough=true \
 -device cxl-rp,port=0,bus=cxl.1,id=cxl_root_port0,chassis=0,slot=2 \
 -device cxl-rp,port=1,bus=cxl.1,id=cxl_root_port1,chassis=0,slot=3 \
 -device cxl-type3,bus=cxl_root_port0,persistent-memdev=cxl-mem1,volatile-memdev=cxl-mem2,id=cxl-mem0,lsa=cxl-lsa1,sn=3 \
 -machine cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=1k,cxl-fmw.0.restrictions=0x6,cxl-fmw.1.targets.0=cxl.1,cxl-fmw.1.size=4G,cxl-fmw.1.interleave-granularity=1k,cxl-fmw.1.restrictions=0xa

Few things in here are to test other new features not posted yet.
1) For multiple HDM decoders need the two root ports ato avoid kernel bug around passthrough
   decoders.
2) The restrictions on cfmws is also new and allows restricting CFMWS to volatile or non volatie.

Other than needing the qmp config line, this poison injection doesn't need anything
specific.

Guessing you might want to use an x86 host as well ;) 
qemu-system-x86_64 -M q35,cxl=on,sata=off,smbus=off -m 4g,maxmem=64G,slots=8 -cpu max -smp 4 \

Drop those bits and it shouldn't make any difference for poison injections.

For injection in QMP 

telnet localhost 4445

and send
{ "execute": "qmp_capabilities" }
first followed by poison injection commands like

{ "execute": "cxl-inject-poison",
    "arguments": {
         "path": "/machine/peripheral/cxl-mem0",
         "start": 1024,
         "length": 64
    }
}

The rest of the testing uses your kernel injection and clearing patches.

Thanks,

Jonathan


> Thanks,
> Alison
> 
> > v2: Thanks to Ira for review and also to Philippe as some of the
> >     changes follow through from comments on precusor series.
> > 
> > - Fixed a bunch of endian issues. Note that QEMU CXL suppport only currently
> >   supports platforms that happen to be little endian so these are more
> >   theoretical than bugs that can be triggered.
> > - Improve handling over mailbox inject poison that overlaps with
> >   qmp injected (which can be bigger).
> > - Tighter checks on alignment.
> > - Add 'Since' entries to qapi docs.
> > - Drop the CXLRetCode move out of this series as it isn't needed for this.
> >   Will appear in next series I post instead (Ira's event series)
> > - Drag down the st24_le_p() patch from Ira's Event series so we can use
> >   it in this series.
> > 
> > Note Alison has stated the kernel series will be post 6.3 material
> > so this one isn't quite as urgent as the patches it is based on.
> > However I think this series in a good state (plus I have lots more queued
> > behind it) hence promoting it from RFC.
> > 
> > Changes since RFC v2: Thanks to Markus for review.
> >  - Improve documentation for QMP interface
> >  - Add better description of baseline series
> >  - Include precursor refactors around ret_code / CXLRetCode as this is now
> >    the first series in suggeste merge order to rely on those.
> >  - Include Ira's cxl_device_get_timestamp() function as it was better than
> >    the equivalent in the RFC.
> > 
> > Based on following series (in order)
> > 1. [PATCH v4 00/10] hw/cxl: CXL emulation cleanups and minor fixes for upstream
> > 2. [PATCH v6 0/8] hw/cxl: RAS error emulation and injection
> > 3. [PATCH v2 0/2] hw/cxl: Passthrough HDM decoder emulation
> > 4. [PATCH v4 0/2] hw/mem: CXL Type-3 Volatile Memory Support
> > 
> > Based on: Message-Id: 20230206172816.8201-1-Jonathan.Cameron@huawei.com
> > Based-on: Message-id: 20230227112751.6101-1-Jonathan.Cameron@huawei.com
> > Based-on: Message-id: 20230227153128.8164-1-Jonathan.Cameron@huawei.com
> > Based-on: Message-id: 20230227163157.6621-1-Jonathan.Cameron@huawei.com
> > 
> > The series supports:
> > 1) Injection of variable length poison regions via QMP (to fake real
> >    memory corruption and ensure we deal with odd overflow corner cases
> >    such as clearing the middle of a large region making the list overflow
> >    as we go from one long entry to two smaller entries.
> > 2) Read of poison list via the CXL mailbox.
> > 3) Injection via the poison injection mailbox command (limited to 64 byte
> >    entries)
> > 4) Clearing of poison injected via either method.
> > 
> > The implementation is meant to be a valid combination of impdef choices
> > based on what the spec allowed. There are a number of places where it could
> > be made more sophisticated that we might consider in future:
> > * Fusing adjacent poison entries if the types match.
> > * Separate injection list and main poison list, to test out limits on
> >   injected poison list being smaller than the main list.
> > * Poison list overflow event (needs event log support in general)
> > * Connecting up to the poison list error record generation (rather complex
> >   and not needed for currently kernel handling testing).
> > 
> > As the kernel code is currently fairly simple, it is likely that the above
> > does not yet matter but who knows what will turn up in future!
> > 
> > Kernel patches:
> >  [PATCH v7 0/6] CXL Poison List Retrieval & Tracing
> >  cover.1676685180.git.alison.schofield@intel.com
> >  [PATCH v2 0/6] cxl: CXL Inject & Clear Poison
> >  cover.1674101475.git.alison.schofield@intel.com
> > 
> > 
> > Ira Weiny (2):
> >   hw/cxl: Introduce cxl_device_get_timestamp() utility function
> >   bswap: Add the ability to store to an unaligned 24 bit field
> > 
> > Jonathan Cameron (4):
> >   hw/cxl: rename mailbox return code type from ret_code to CXLRetCode
> >   hw/cxl: QMP based poison injection support
> >   hw/cxl: Add poison injection via the mailbox.
> >   hw/cxl: Add clear poison mailbox command support.
> > 
> >  hw/cxl/cxl-device-utils.c   |  15 ++
> >  hw/cxl/cxl-mailbox-utils.c  | 285 ++++++++++++++++++++++++++++++------
> >  hw/mem/cxl_type3.c          |  92 ++++++++++++
> >  hw/mem/cxl_type3_stubs.c    |   6 +
> >  include/hw/cxl/cxl_device.h |  23 +++
> >  include/qemu/bswap.h        |  23 +++
> >  qapi/cxl.json               |  18 +++
> >  7 files changed, 420 insertions(+), 42 deletions(-)
> > 
> > -- 
> > 2.37.2
> >