From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675316757; cv=none; d=zohomail.com; s=zohoarc; b=JCteYVj6OiyjVq1EtsxNOhKe0M/Yh7aOXLa1qo8UF5t9cm+lhhZ9v7H7GN81k6tUjk9LZLQZ2oN8QLFe7V+51FrfZwZ1uY0ETABu7kdBOPzztlfxGAwHy2WsmsU+XCbPqTq8ujTrsal9iytn8JjMkjuQjTvcs9WovowHbb4Gnxg= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675316757; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=yvHt3eGGEjqb3e2ZRSLZ1+Ql9O9FkoTlWa17WIrEARM=; b=Y7Dsexy3JEF19NqDjfvmFrfbHFuA5xUzeIXfRHM+XnI2N/YxgPi5KvV3nHiu6ZfQB8ACisY3bFFWNFF1yBkQCeTG7L938+pVhvOXP0C6X8eZJ9MgMNjqkFlo3dh4Os+a2G+UXYS8x+nGo47311Fq1wT2kO5D3BxXhBT3/RN9sxc= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1675316757426846.7370667593017; Wed, 1 Feb 2023 21:45:57 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSP9-00087B-P6; Thu, 02 Feb 2023 00:45:03 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSP7-00085q-Oj for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:01 -0500 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSP3-0006x1-Dy for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:01 -0500 Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124iEKc006703; Thu, 2 Feb 2023 05:44:55 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfkfe2efh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:44:55 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254PpT013015; Thu, 2 Feb 2023 05:44:54 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:44:54 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=yvHt3eGGEjqb3e2ZRSLZ1+Ql9O9FkoTlWa17WIrEARM=; b=myCWbzZ6Ee7OA1ik8FbUBgg/f67x+IDuC3EthFWM532R8JRjVARKhsIq6CbjZihVF6xZ aLEPXwovXLhhUyiXQ9hiTMcez6Wk7+GPlFEqkJ/r7bS66MYG4Ifwqmw1h1zMXpFyOKU9 /afL67I6CuMNUfVY9ymfbdvYZQotVH3dhmB0QJzdD+8YbAmnsUq66Ppfu/M4TXGc7VAI gKxLkKML0lRlq31PFzx+RD0orXKxjhT/4kOBdmsLfK9I00PYT8jnuCG/nayuU04KzdCE iBt2uxKqN0K98qkVhqA/HFLA34VuIL/z7LYQVq5caLEaoa/aZBAscccLkwvlGaxhXUnw kA== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 01/23] vfio-user: introduce vfio-user protocol specification Date: Wed, 1 Feb 2023 21:55:37 -0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-GUID: PqYTTnhDMjhIh1B1JrqOzEBAoBOfhY9s X-Proofpoint-ORIG-GUID: PqYTTnhDMjhIh1B1JrqOzEBAoBOfhY9s Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.177.32; envelope-from=john.g.johnson@oracle.com; helo=mx0b-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, WEIRD_QUOTING=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675316759332100005 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: Thanos Makatos This patch introduces the vfio-user protocol specification (formerly known as VFIO-over-socket), which is designed to allow devices to be emulated outside QEMU, in a separate process. vfio-user reuses the existing VFIO defines, structs and concepts. It has been earlier discussed as an RFC in: "RFC: use VFIO over a UNIX domain socket to implement device offloading" Signed-off-by: John G Johnson Signed-off-by: Thanos Makatos Signed-off-by: John Levon --- docs/devel/index-internals.rst | 1 + docs/devel/vfio-user.rst | 1522 ++++++++++++++++++++++++++++++++++++= ++++ MAINTAINERS | 6 + 3 files changed, 1529 insertions(+) create mode 100644 docs/devel/vfio-user.rst diff --git a/docs/devel/index-internals.rst b/docs/devel/index-internals.rst index e1a93df..0ecb5c6 100644 --- a/docs/devel/index-internals.rst +++ b/docs/devel/index-internals.rst @@ -17,5 +17,6 @@ Details about QEMU's various subsystems including how to = add features to them. s390-dasd-ipl tracing vfio-migration + vfio-user writing-monitor-commands virtio-backends diff --git a/docs/devel/vfio-user.rst b/docs/devel/vfio-user.rst new file mode 100644 index 0000000..0d96477 --- /dev/null +++ b/docs/devel/vfio-user.rst @@ -0,0 +1,1522 @@ +.. include:: +******************************** +vfio-user Protocol Specification +******************************** + +-------------- +Version_ 0.9.1 +-------------- + +.. contents:: Table of Contents + +Introduction +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +vfio-user is a protocol that allows a device to be emulated in a separate +process outside of a Virtual Machine Monitor (VMM). vfio-user devices cons= ist +of a generic VFIO device type, living inside the VMM, which we call the cl= ient, +and the core device implementation, living outside the VMM, which we call = the +server. + +The vfio-user specification is partly based on the +`Linux VFIO ioctl interface `_. + +VFIO is a mature and stable API, backed by an extensively used framework. = The +existing VFIO client implementation in QEMU (``qemu/hw/vfio/``) can be lar= gely +re-used, though there is nothing in this specification that requires that +particular implementation. None of the VFIO kernel modules are required for +supporting the protocol, on either the client or server side. Some source +definitions in VFIO are re-used for vfio-user. + +The main idea is to allow a virtual device to function in a separate proce= ss in +the same host over a UNIX domain socket. A UNIX domain socket (``AF_UNIX``= ) is +chosen because file descriptors can be trivially sent over it, which in tu= rn +allows: + +* Sharing of client memory for DMA with the server. +* Sharing of server memory with the client for fast MMIO. +* Efficient sharing of eventfd's for triggering interrupts. + +Other socket types could be used which allow the server to run in a separa= te +guest in the same host (``AF_VSOCK``) or remotely (``AF_INET``). Theoretic= ally +the underlying transport does not necessarily have to be a socket, however= we do +not examine such alternatives. In this protocol version we focus on using = a UNIX +domain socket and introduce basic support for the other two types of socke= ts +without considering performance implications. + +While passing of file descriptors is desirable for performance reasons, su= pport +is not necessary for either the client or the server in order to implement= the +protocol. There is always an in-band, message-passing fall back mechanism. + +Overview +=3D=3D=3D=3D=3D=3D=3D=3D + +VFIO is a framework that allows a physical device to be securely passed th= rough +to a user space process; the device-specific kernel driver does not drive = the +device at all. Typically, the user space process is a VMM and the device = is +passed through to it in order to achieve high performance. VFIO provides a= n API +and the required functionality in the kernel. QEMU has adopted VFIO to all= ow a +guest to directly access physical devices, instead of emulating them in +software. + +vfio-user reuses the core VFIO concepts defined in its API, but implements= them +as messages to be sent over a socket. It does not change the kernel-based = VFIO +in any way, in fact none of the VFIO kernel modules need to be loaded to u= se +vfio-user. It is also possible for the client to concurrently use the curr= ent +kernel-based VFIO for one device, and vfio-user for another device. + +VFIO Device Model +----------------- + +A device under VFIO presents a standard interface to the user process. Man= y of +the VFIO operations in the existing interface use the ``ioctl()`` system c= all, and +references to the existing interface are called the ``ioctl()`` implementa= tion in +this document. + +The following sections describe the set of messages that implement the vfi= o-user +interface over a socket. In many cases, the messages are analogous to data +structures used in the ``ioctl()`` implementation. Messages derived from t= he +``ioctl()`` will have a name derived from the ``ioctl()`` command name. E= .g., the +``VFIO_DEVICE_GET_INFO`` ``ioctl()`` command becomes a +``VFIO_USER_DEVICE_GET_INFO`` message. The purpose of this reuse is to sh= are as +much code as feasible with the ``ioctl()`` implementation``. + +Connection Initiation +^^^^^^^^^^^^^^^^^^^^^ + +After the client connects to the server, the initial client message is +``VFIO_USER_VERSION`` to propose a protocol version and set of capabilitie= s to +apply to the session. The server replies with a compatible version and set= of +capabilities it supports, or closes the connection if it cannot support the +advertised version. + +Device Information +^^^^^^^^^^^^^^^^^^ + +The client uses a ``VFIO_USER_DEVICE_GET_INFO`` message to query the serve= r for +information about the device. This information includes: + +* The device type and whether it supports reset (``VFIO_DEVICE_FLAGS_``), +* the number of device regions, and +* the device presents to the client the number of interrupt types the devi= ce + supports. + +Region Information +^^^^^^^^^^^^^^^^^^ + +The client uses ``VFIO_USER_DEVICE_GET_REGION_INFO`` messages to query the +server for information about the device's regions. This information descri= bes: + +* Read and write permissions, whether it can be memory mapped, and whether= it + supports additional capabilities (``VFIO_REGION_INFO_CAP_``). +* Region index, size, and offset. + +When a device region can be mapped by the client, the server provides a fi= le +descriptor which the client can ``mmap()``. The server is responsible for +polling for client updates to memory mapped regions. + +Region Capabilities +""""""""""""""""""" + +Some regions have additional capabilities that cannot be described adequat= ely +by the region info data structure. These capabilities are returned in the +region info reply in a list similar to PCI capabilities in a PCI device's +configuration space. + +Sparse Regions +"""""""""""""" +A region can be memory-mappable in whole or in part. When only a subset of= a +region can be mapped by the client, a ``VFIO_REGION_INFO_CAP_SPARSE_MMAP`` +capability is included in the region info reply. This capability describes +which portions can be mapped by the client. + +.. Note:: + For example, in a virtual NVMe controller, sparse regions can be used so + that accesses to the NVMe registers (found in the beginning of BAR0) are + trapped (an infrequent event), while allowing direct access to the door= bells + (an extremely frequent event as every I/O submission requires a write to + BAR0), found in the next page after the NVMe registers in BAR0. + +Device-Specific Regions +""""""""""""""""""""""" + +A device can define regions additional to the standard ones (e.g. PCI inde= xes +0-8). This is achieved by including a ``VFIO_REGION_INFO_CAP_TYPE`` capabi= lity +in the region info reply of a device-specific region. Such regions are ref= lected +in ``struct vfio_user_device_info.num_regions``. Thus, for PCI devices this +value can be equal to, or higher than, ``VFIO_PCI_NUM_REGIONS``. + +Region I/O via file descriptors +------------------------------- + +For unmapped regions, region I/O from the client is done via +``VFIO_USER_REGION_READ/WRITE``. As an optimization, ioeventfds or ioregi= onfds +may be configured for sub-regions of some regions. A client may request +information on these sub-regions via ``VFIO_USER_DEVICE_GET_REGION_IO_FDS`= `; by +configuring the returned file descriptors as ioeventfds or ioregionfds, the +server can be directly notified of I/O (for example, by KVM) without takin= g a +trip through the client. + +Interrupts +^^^^^^^^^^ + +The client uses ``VFIO_USER_DEVICE_GET_IRQ_INFO`` messages to query the se= rver +for the device's interrupt types. The interrupt types are specific to the = bus +the device is attached to, and the client is expected to know the capabili= ties +of each interrupt type. The server can signal an interrupt by directly inj= ecting +interrupts into the guest via an event file descriptor. The client configu= res +how the server signals an interrupt with ``VFIO_USER_SET_IRQS`` messages. + +Device Read and Write +^^^^^^^^^^^^^^^^^^^^^ + +When the guest executes load or store operations to an unmapped device reg= ion, +the client forwards these operations to the server with +``VFIO_USER_REGION_READ`` or ``VFIO_USER_REGION_WRITE`` messages. The serv= er +will reply with data from the device on read operations or an acknowledgem= ent on +write operations. See `Read and Write Operations`_. + +Client memory access +-------------------- + +The client uses ``VFIO_USER_DMA_MAP`` and ``VFIO_USER_DMA_UNMAP`` messages= to +inform the server of the valid DMA ranges that the server can access on be= half +of a device (typically, VM guest memory). DMA memory may be accessed by the +server via ``VFIO_USER_DMA_READ`` and ``VFIO_USER_DMA_WRITE`` messages ove= r the +socket. In this case, the "DMA" part of the naming is a misnomer. + +Actual direct memory access of client memory from the server is possible i= f the +client provides file descriptors the server can ``mmap()``. Note that ``mm= ap()`` +privileges cannot be revoked by the client, therefore file descriptors sho= uld +only be exported in environments where the client trusts the server not to +corrupt guest memory. + +See `Read and Write Operations`_. + +Client/server interactions +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D + +Socket +------ + +A server can serve: + +1) one or more clients, and/or +2) one or more virtual devices, belonging to one or more clients. + +The current protocol specification requires a dedicated socket per +client/server connection. It is a server-side implementation detail whethe= r a +single server handles multiple virtual devices from the same or multiple +clients. The location of the socket is implementation-specific. Multiplexi= ng +clients, devices, and servers over the same socket is not supported in this +version of the protocol. + +Authentication +-------------- + +For ``AF_UNIX``, we rely on OS mandatory access controls on the socket fil= es, +therefore it is up to the management layer to set up the socket as require= d. +Socket types that span guests or hosts will require a proper authentication +mechanism. Defining that mechanism is deferred to a future version of the +protocol. + +Command Concurrency +------------------- + +A client may pipeline multiple commands without waiting for previous comma= nd +replies. The server will process commands in the order they are received.= A +consequence of this is if a client issues a command with the *No_reply* bi= t, +then subsequently issues a command without *No_reply*, the older command w= ill +have been processed before the reply to the younger command is sent by the +server. The client must be aware of the device's capability to process +concurrent commands if pipelining is used. For example, pipelining allows +multiple client threads to concurrently access device regions; the client = must +ensure these accesses obey device semantics. + +An example is a frame buffer device, where the device may allow concurrent +access to different areas of video memory, but may have indeterminate beha= vior +if concurrent accesses are performed to command or status registers. + +Note that unrelated messages sent from the server to the client can appear= in +between a client to server request/reply and vice versa. + +Implementers should be prepared for certain commands to exhibit potentially +unbounded latencies. For example, ``VFIO_USER_DEVICE_RESET`` may take an +arbitrarily long time to complete; clients should take care not to block +unnecessarily. + +Socket Disconnection Behavior +----------------------------- +The server and the client can disconnect from each other, either intention= ally +or unexpectedly. Both the client and the server need to know how to handle= such +events. + +Server Disconnection +^^^^^^^^^^^^^^^^^^^^ +A server disconnecting from the client may indicate that: + +1) A virtual device has been restarted, either intentionally (e.g. because= of a + device update) or unintentionally (e.g. because of a crash). +2) A virtual device has been shut down with no intention to be restarted. + +It is impossible for the client to know whether or not a failure is +intermittent or innocuous and should be retried, therefore the client shou= ld +reset the VFIO device when it detects the socket has been disconnected. +Error recovery will be driven by the guest's device error handling +behavior. + +Client Disconnection +^^^^^^^^^^^^^^^^^^^^ +The client disconnecting from the server primarily means that the client +has exited. Currently, this means that the guest is shut down so the devic= e is +no longer needed therefore the server can automatically exit. However, the= re +can be cases where a client disconnection should not result in a server ex= it: + +1) A single server serving multiple clients. +2) A multi-process QEMU upgrading itself step by step, which is not yet + implemented. + +Therefore in order for the protocol to be forward compatible, the server s= hould +respond to a client disconnection as follows: + + - all client memory regions are unmapped and cleaned up (including closin= g any + passed file descriptors) + - all IRQ file descriptors passed from the old client are closed + - the device state should otherwise be retained + +The expectation is that when a client reconnects, it will re-establish IRQ= and +client memory mappings. + +If anything happens to the client (such as qemu really did exit), the cont= rol +stack will know about it and can clean up resources accordingly. + +Security Considerations +----------------------- + +Speaking generally, vfio-user clients should not trust servers, and vice v= ersa. +Standard tools and mechanisms should be used on both sides to validate inp= ut and +prevent against denial of service scenarios, buffer overflow, etc. + +Request Retry and Response Timeout +---------------------------------- +A failed command is a command that has been successfully sent and has been +responded to with an error code. Failure to send the command in the first = place +(e.g. because the socket is disconnected) is a different type of error exa= mined +earlier in the disconnect section. + +.. Note:: + QEMU's VFIO retries certain operations if they fail. While this makes s= ense + for real HW, we don't know for sure whether it makes sense for virtual + devices. + +Defining a retry and timeout scheme is deferred to a future version of the +protocol. + +Message sizes +------------- + +Some requests have an ``argsz`` field. In a request, it defines the maximum +expected reply payload size, which should be at least the size of the fixed +reply payload headers defined here. The *request* payload size is defined = by the +usual ``msg_size`` field in the header, not the ``argsz`` field. + +In a reply, the server sets ``argsz`` field to the size needed for a full +payload size. This may be less than the requested maximum size. This may be +larger than the requested maximum size: in that case, the full payload is = not +included in the reply, but the ``argsz`` field in the reply indicates the = needed +size, allowing a client to allocate a larger buffer for holding the reply = before +trying again. + +In addition, during negotiation (see `Version`_), the client and server m= ay +each specify a ``max_data_xfer_size`` value; this defines the maximum data= that +may be read or written via one of the ``VFIO_USER_DMA/REGION_READ/WRITE`` +messages; see `Read and Write Operations`_. + +Protocol Specification +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +To distinguish from the base VFIO symbols, all vfio-user symbols are prefi= xed +with ``vfio_user`` or ``VFIO_USER``. In this revision, all data is in the +endianness of the host system, although this may be relaxed in future +revisions in cases where the client and server run on different hosts +with different endianness. + +Unless otherwise specified, all sizes should be presumed to be in bytes. + +.. _Commands: + +Commands +-------- +The following table lists the VFIO message command IDs, and whether the +message command is sent from the client or the server. + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +Name Command Request Direction +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +``VFIO_USER_VERSION`` 1 client -> server +``VFIO_USER_DMA_MAP`` 2 client -> server +``VFIO_USER_DMA_UNMAP`` 3 client -> server +``VFIO_USER_DEVICE_GET_INFO`` 4 client -> server +``VFIO_USER_DEVICE_GET_REGION_INFO`` 5 client -> server +``VFIO_USER_DEVICE_GET_REGION_IO_FDS`` 6 client -> server +``VFIO_USER_DEVICE_GET_IRQ_INFO`` 7 client -> server +``VFIO_USER_DEVICE_SET_IRQS`` 8 client -> server +``VFIO_USER_REGION_READ`` 9 client -> server +``VFIO_USER_REGION_WRITE`` 10 client -> server +``VFIO_USER_DMA_READ`` 11 server -> client +``VFIO_USER_DMA_WRITE`` 12 server -> client +``VFIO_USER_DEVICE_RESET`` 13 client -> server +``VFIO_USER_REGION_WRITE_MULTI`` 15 client -> server +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Header +------ + +All messages, both command messages and reply messages, are preceded by a +16-byte header that contains basic information about the message. The head= er is +followed by message-specific data described in the sections below. + ++----------------+--------+-------------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D= +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ +| Message ID | 0 | 2 | ++----------------+--------+-------------+ +| Command | 2 | 2 | ++----------------+--------+-------------+ +| Message size | 4 | 4 | ++----------------+--------+-------------+ +| Flags | 8 | 4 | ++----------------+--------+-------------+ +| | +-----+------------+ | +| | | Bit | Definition | | +| | +=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ | +| | | 0-3 | Type | | +| | +-----+------------+ | +| | | 4 | No_reply | | +| | +-----+------------+ | +| | | 5 | Error | | +| | +-----+------------+ | ++----------------+--------+-------------+ +| Error | 12 | 4 | ++----------------+--------+-------------+ +| | 16 | variable | ++----------------+--------+-------------+ + +* *Message ID* identifies the message, and is echoed in the command's reply + message. Message IDs belong entirely to the sender, can be re-used (even + concurrently) and the receiver must not make any assumptions about their + uniqueness. +* *Command* specifies the command to be executed, listed in Commands_. It = is + also set in the reply header. +* *Message size* contains the size of the entire message, including the he= ader. +* *Flags* contains attributes of the message: + + * The *Type* bits indicate the message type. + + * *Command* (value 0x0) indicates a command message. + * *Reply* (value 0x1) indicates a reply message acknowledging a previ= ous + command with the same message ID. + * *No_reply* in a command message indicates that no reply is needed for = this + command. This is commonly used when multiple commands are sent, and o= nly + the last needs acknowledgement. + * *Error* in a reply message indicates the command being acknowledged had + an error. In this case, the *Error* field will be valid. + +* *Error* in a reply message is an optional UNIX errno value. It may be ze= ro + even if the Error bit is set in Flags. It is reserved in a command messa= ge. + +Each command message in Commands_ must be replied to with a reply message, +unless the message sets the *No_Reply* bit. The reply consists of the hea= der +with the *Reply* bit set, plus any additional data. + +If an error occurs, the reply message must only include the reply header. + +As the header is standard in both requests and replies, it is not included= in +the command-specific specifications below; each message definition should = be +appended to the standard header, and the offsets are given from the end of= the +standard header. + +``VFIO_USER_VERSION`` +--------------------- + +.. _Version: + +This is the initial message sent by the client after the socket connection= is +established; the same format is used for the server's reply. + +Upon establishing a connection, the client must send a ``VFIO_USER_VERSION= `` +message proposing a protocol version and a set of capabilities. The server +compares these with the versions and capabilities it supports and sends a +``VFIO_USER_VERSION`` reply according to the following rules. + +* The major version in the reply must be the same as proposed. If the clie= nt + does not support the proposed major, it closes the connection. +* The minor version in the reply must be equal to or less than the minor + version proposed. +* The capability list must be a subset of those proposed. If the server + requires a capability the client did not include, it closes the connecti= on. + +The protocol major version will only change when incompatible protocol cha= nges +are made, such as changing the message format. The minor version may change +when compatible changes are made, such as adding new messages or capabilit= ies, +Both the client and server must support all minor versions less than the +maximum minor version it supports. E.g., an implementation that supports +version 1.3 must also support 1.0 through 1.2. + +When making a change to this specification, the protocol version number mu= st +be included in the form "added in version X.Y" + +Request +^^^^^^^ + +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D =3D=3D=3D= =3D +Name Offset Size +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D =3D=3D=3D= =3D +version major 0 2 +version minor 2 2 +version data 4 variable (including terminating NUL). Optional. +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D =3D=3D=3D= =3D + +The version data is an optional UTF-8 encoded JSON byte array with the fol= lowing +format: + ++--------------+--------+-----------------------------------+ +| Name | Type | Description | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D+ +| capabilities | object | Contains common capabilities that | +| | | the sender supports. Optional. | ++--------------+--------+-----------------------------------+ + +Capabilities: + ++--------------------+---------+------------------------------------------= ------+ +| Name | Type | Description = | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D= =3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D+ +| max_msg_fds | number | Maximum number of file descriptors that c= an be | +| | | received by the sender in one message. = | +| | | Optional. If not specified then the recei= ver | +| | | must assume a value of ``1``. = | ++--------------------+---------+------------------------------------------= ------+ +| max_data_xfer_size | number | Maximum ``count`` for data transfer messa= ges; | +| | | see `Read and Write Operations`_. Optiona= l, | +| | | with a default value of 1048576 bytes. = | ++--------------------+---------+------------------------------------------= ------+ +| pgsizes | number | Page sizes supported in DMA map operation= s | +| | | or'ed together. Optional, with a default = value | +| | | of supporting only 4k pages. = | ++--------------------+---------+------------------------------------------= ------+ +| max_dma_maps | number | Maximum number DMA map windows that can b= e | +| | | valid simultaneously. Optional, with a = | +| | | value of 65535 (64k-1). = | ++--------------------+---------+------------------------------------------= ------+ +| migration | object | Migration capability parameters. If missi= ng | +| | | then migration is not supported by the se= nder. | ++--------------------+---------+------------------------------------------= ------+ +| write_multiple | boolean | ``VFIO_USER_REGION_WRITE_MULTI`` messages= | +| | | are supported if the value is ``true``. = | ++--------------------+---------+------------------------------------------= ------+ + +The migration capability contains the following name/value pairs: + ++-----------------+--------+----------------------------------------------= ----+ +| Name | Type | Description = | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D= =3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D+ +| pgsize | number | Page size of dirty pages bitmap. The smallest= | +| | | between the client and the server is used. = | ++-----------------+--------+----------------------------------------------= ----+ +| max_bitmap_size | number | Maximum bitmap size in ``VFIO_USER_DIRTY_PAGE= S`` | +| | | and ``VFIO_DMA_UNMAP`` messages. Optional, = | +| | | with a default value of 256MB. = | ++-----------------+--------+----------------------------------------------= ----+ + +Reply +^^^^^ + +The same message format is used in the server's reply with the semantics +described above. + +``VFIO_USER_DMA_MAP`` +--------------------- + +This command message is sent by the client to the server to inform it of t= he +memory regions the server can access. It must be sent before the server can +perform any DMA to the client. It is normally sent directly after the vers= ion +handshake is completed, but may also occur when memory is added to the cli= ent, +or if the client uses a vIOMMU. + +Request +^^^^^^^ + +The request payload for this message is a structure of the following forma= t: + ++-------------+--------+-------------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ +| argsz | 0 | 4 | ++-------------+--------+-------------+ +| flags | 4 | 4 | ++-------------+--------+-------------+ +| | +-----+------------+ | +| | | Bit | Definition | | +| | +=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ | +| | | 0 | readable | | +| | +-----+------------+ | +| | | 1 | writeable | | +| | +-----+------------+ | ++-------------+--------+-------------+ +| offset | 8 | 8 | ++-------------+--------+-------------+ +| address | 16 | 8 | ++-------------+--------+-------------+ +| size | 24 | 8 | ++-------------+--------+-------------+ + +* *argsz* is the size of the above structure. Note there is no reply paylo= ad, + so this field differs from other message types. +* *flags* contains the following region attributes: + + * *readable* indicates that the region can be read from. + + * *writeable* indicates that the region can be written to. + +* *offset* is the file offset of the region with respect to the associated= file + descriptor, or zero if the region is not mappable +* *address* is the base DMA address of the region. +* *size* is the size of the region. + +This structure is 32 bytes in size, so the message size is 16 + 32 bytes. + +If the DMA region being added can be directly mapped by the server, a file +descriptor must be sent as part of the message meta-data. The region can be +mapped via the mmap() system call. On ``AF_UNIX`` sockets, the file descri= ptor +must be passed as ``SCM_RIGHTS`` type ancillary data. Otherwise, if the D= MA +region cannot be directly mapped by the server, no file descriptor must be= sent +as part of the message meta-data and the DMA region can be accessed by the +server using ``VFIO_USER_DMA_READ`` and ``VFIO_USER_DMA_WRITE`` messages, +explained in `Read and Write Operations`_. A command to map over an existi= ng +region must be failed by the server with ``EEXIST`` set in error field in = the +reply. + +Reply +^^^^^ + +There is no payload in the reply message. + +``VFIO_USER_DMA_UNMAP`` +----------------------- + +This command message is sent by the client to the server to inform it that= a +DMA region, previously made available via a ``VFIO_USER_DMA_MAP`` command +message, is no longer available for DMA. It typically occurs when memory is +subtracted from the client or if the client uses a vIOMMU. The DMA region = is +described by the following structure: + +Request +^^^^^^^ + +The request payload for this message is a structure of the following forma= t: + ++--------------+--------+------------------------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ +| argsz | 0 | 4 | ++--------------+--------+------------------------+ +| flags | 4 | 4 | ++--------------+--------+------------------------+ +| address | 8 | 8 | ++--------------+--------+------------------------+ +| size | 16 | 8 | ++--------------+--------+------------------------+ + +* *argsz* is the maximum size of the reply payload. +* *flags* is unused in this version. +* *address* is the base DMA address of the DMA region. +* *size* is the size of the DMA region. + +The address and size of the DMA region being unmapped must match exactly a +previous mapping. + +Reply +^^^^^ + +Upon receiving a ``VFIO_USER_DMA_UNMAP`` command, if the file descriptor is +mapped then the server must release all references to that DMA region befo= re +replying, which potentially includes in-flight DMA transactions. + +The server responds with the original DMA entry in the request. + + +``VFIO_USER_DEVICE_GET_INFO`` +----------------------------- + +This command message is sent by the client to the server to query for basic +information about the device. + +Request +^^^^^^^ + ++-------------+--------+--------------------------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ +| argsz | 0 | 4 | ++-------------+--------+--------------------------+ +| flags | 4 | 4 | ++-------------+--------+--------------------------+ +| | +-----+-------------------------+ | +| | | Bit | Definition | | +| | +=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ | +| | | 0 | VFIO_DEVICE_FLAGS_RESET | | +| | +-----+-------------------------+ | +| | | 1 | VFIO_DEVICE_FLAGS_PCI | | +| | +-----+-------------------------+ | ++-------------+--------+--------------------------+ +| num_regions | 8 | 4 | ++-------------+--------+--------------------------+ +| num_irqs | 12 | 4 | ++-------------+--------+--------------------------+ + +* *argsz* is the maximum size of the reply payload +* all other fields must be zero. + +Reply +^^^^^ + ++-------------+--------+--------------------------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ +| argsz | 0 | 4 | ++-------------+--------+--------------------------+ +| flags | 4 | 4 | ++-------------+--------+--------------------------+ +| | +-----+-------------------------+ | +| | | Bit | Definition | | +| | +=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ | +| | | 0 | VFIO_DEVICE_FLAGS_RESET | | +| | +-----+-------------------------+ | +| | | 1 | VFIO_DEVICE_FLAGS_PCI | | +| | +-----+-------------------------+ | ++-------------+--------+--------------------------+ +| num_regions | 8 | 4 | ++-------------+--------+--------------------------+ +| num_irqs | 12 | 4 | ++-------------+--------+--------------------------+ + +* *argsz* is the size required for the full reply payload (16 bytes today) +* *flags* contains the following device attributes. + + * ``VFIO_DEVICE_FLAGS_RESET`` indicates that the device supports the + ``VFIO_USER_DEVICE_RESET`` message. + * ``VFIO_DEVICE_FLAGS_PCI`` indicates that the device is a PCI device. + +* *num_regions* is the number of memory regions that the device exposes. +* *num_irqs* is the number of distinct interrupt types that the device sup= ports. + +This version of the protocol only supports PCI devices. Additional devices= may +be supported in future versions. + +``VFIO_USER_DEVICE_GET_REGION_INFO`` +------------------------------------ + +This command message is sent by the client to the server to query for +information about device regions. The VFIO region info structure is define= d in +```` (``struct vfio_region_info``). + +Request +^^^^^^^ + ++------------+--------+------------------------------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D+ +| argsz | 0 | 4 | ++------------+--------+------------------------------+ +| flags | 4 | 4 | ++------------+--------+------------------------------+ +| index | 8 | 4 | ++------------+--------+------------------------------+ +| cap_offset | 12 | 4 | ++------------+--------+------------------------------+ +| size | 16 | 8 | ++------------+--------+------------------------------+ +| offset | 24 | 8 | ++------------+--------+------------------------------+ + +* *argsz* the maximum size of the reply payload +* *index* is the index of memory region being queried, it is the only field + that is required to be set in the command message. +* all other fields must be zero. + +Reply +^^^^^ + ++------------+--------+------------------------------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D+ +| argsz | 0 | 4 | ++------------+--------+------------------------------+ +| flags | 4 | 4 | ++------------+--------+------------------------------+ +| | +-----+-----------------------------+ | +| | | Bit | Definition | | +| | +=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ | +| | | 0 | VFIO_REGION_INFO_FLAG_READ | | +| | +-----+-----------------------------+ | +| | | 1 | VFIO_REGION_INFO_FLAG_WRITE | | +| | +-----+-----------------------------+ | +| | | 2 | VFIO_REGION_INFO_FLAG_MMAP | | +| | +-----+-----------------------------+ | +| | | 3 | VFIO_REGION_INFO_FLAG_CAPS | | +| | +-----+-----------------------------+ | ++------------+--------+------------------------------+ ++------------+--------+------------------------------+ +| index | 8 | 4 | ++------------+--------+------------------------------+ +| cap_offset | 12 | 4 | ++------------+--------+------------------------------+ +| size | 16 | 8 | ++------------+--------+------------------------------+ +| offset | 24 | 8 | ++------------+--------+------------------------------+ + +* *argsz* is the size required for the full reply payload (region info str= ucture + plus the size of any region capabilities) +* *flags* are attributes of the region: + + * ``VFIO_REGION_INFO_FLAG_READ`` allows client read access to the region. + * ``VFIO_REGION_INFO_FLAG_WRITE`` allows client write access to the regi= on. + * ``VFIO_REGION_INFO_FLAG_MMAP`` specifies the client can mmap() the reg= ion. + When this flag is set, the reply will include a file descriptor in its + meta-data. On ``AF_UNIX`` sockets, the file descriptors will be passed= as + ``SCM_RIGHTS`` type ancillary data. + * ``VFIO_REGION_INFO_FLAG_CAPS`` indicates additional capabilities found= in the + reply. + +* *index* is the index of memory region being queried, it is the only field + that is required to be set in the command message. +* *cap_offset* describes where additional region capabilities can be found. + cap_offset is relative to the beginning of the VFIO region info structur= e. + The data structure it points is a VFIO cap header defined in + ````. +* *size* is the size of the region. +* *offset* is the offset that should be given to the mmap() system call for + regions with the MMAP attribute. It is also used as the base offset when + mapping a VFIO sparse mmap area, described below. + +VFIO region capabilities +"""""""""""""""""""""""" + +The VFIO region information can also include a capabilities list. This lis= t is +similar to a PCI capability list - each entry has a common header that +identifies a capability and where the next capability in the list can be f= ound. +The VFIO capability header format is defined in ```` (``stru= ct +vfio_info_cap_header``). + +VFIO cap header format +"""""""""""""""""""""" + ++---------+--------+------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D+ +| id | 0 | 2 | ++---------+--------+------+ +| version | 2 | 2 | ++---------+--------+------+ +| next | 4 | 4 | ++---------+--------+------+ + +* *id* is the capability identity. +* *version* is a capability-specific version number. +* *next* specifies the offset of the next capability in the capability lis= t. It + is relative to the beginning of the VFIO region info structure. + +VFIO sparse mmap cap header +""""""""""""""""""""""""""" + ++------------------+----------------------------------+ +| Name | Value | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D+ +| id | VFIO_REGION_INFO_CAP_SPARSE_MMAP | ++------------------+----------------------------------+ +| version | 0x1 | ++------------------+----------------------------------+ +| next | | ++------------------+----------------------------------+ +| sparse mmap info | VFIO region info sparse mmap | ++------------------+----------------------------------+ + +This capability is defined when only a subrange of the region supports +direct access by the client via mmap(). The VFIO sparse mmap area is defin= ed in +```` (``struct vfio_region_sparse_mmap_area`` and ``struct +vfio_region_info_cap_sparse_mmap``). + +VFIO region info cap sparse mmap +"""""""""""""""""""""""""""""""" + ++----------+--------+------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D= =3D+ +| nr_areas | 0 | 4 | ++----------+--------+------+ +| reserved | 4 | 4 | ++----------+--------+------+ +| offset | 8 | 8 | ++----------+--------+------+ +| size | 16 | 8 | ++----------+--------+------+ +| ... | | | ++----------+--------+------+ + +* *nr_areas* is the number of sparse mmap areas in the region. +* *offset* and size describe a single area that can be mapped by the clien= t. + There will be *nr_areas* pairs of offset and size. The offset will be ad= ded to + the base offset given in the ``VFIO_USER_DEVICE_GET_REGION_INFO`` to for= m the + offset argument of the subsequent mmap() call. + +The VFIO sparse mmap area is defined in ```` (``struct +vfio_region_info_cap_sparse_mmap``). + + +``VFIO_USER_DEVICE_GET_REGION_IO_FDS`` +-------------------------------------- + +Clients can access regions via ``VFIO_USER_REGION_READ/WRITE`` or, if prov= ided, by +``mmap()`` of a file descriptor provided by the server. + +``VFIO_USER_DEVICE_GET_REGION_IO_FDS`` provides an alternative access mech= anism via +file descriptors. This is an optional feature intended for performance +improvements where an underlying sub-system (such as KVM) supports communi= cation +across such file descriptors to the vfio-user server, without needing to +round-trip through the client. + +The server returns an array of sub-regions for the requested region. Each +sub-region describes a span (offset and size) of a region, along with the +requested file descriptor notification mechanism to use. Each sub-region = in the +response message may choose to use a different method, as defined below. = The +two mechanisms supported in this specification are ioeventfds and ioregion= fds. + +The server in addition returns a file descriptor in the ancillary data; cl= ients +are expected to configure each sub-region's file descriptor with the reque= sted +notification method. For example, a client could configure KVM with the +requested ioeventfd via a ``KVM_IOEVENTFD`` ``ioctl()``. + +Request +^^^^^^^ + ++-------------+--------+------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D= =3D=3D=3D=3D+ +| argsz | 0 | 4 | ++-------------+--------+------+ +| flags | 4 | 4 | ++-------------+--------+------+ +| index | 8 | 4 | ++-------------+--------+------+ +| count | 12 | 4 | ++-------------+--------+------+ + +* *argsz* the maximum size of the reply payload +* *index* is the index of memory region being queried +* all other fields must be zero + +The client must set ``flags`` to zero and specify the region being queried= in +the ``index``. + +Reply +^^^^^ + ++-------------+--------+------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D= =3D=3D=3D=3D+ +| argsz | 0 | 4 | ++-------------+--------+------+ +| flags | 4 | 4 | ++-------------+--------+------+ +| index | 8 | 4 | ++-------------+--------+------+ +| count | 12 | 4 | ++-------------+--------+------+ +| sub-regions | 16 | ... | ++-------------+--------+------+ + +* *argsz* is the size of the region IO FD info structure plus the + total size of the sub-region array. Thus, each array entry "i" is at off= set + i * ((argsz - 32) / count). Note that currently this is 40 bytes for bot= h IO + FD types, but this is not to be relied on. As elsewhere, this indicates = the + full reply payload size needed. +* *flags* must be zero +* *index* is the index of memory region being queried +* *count* is the number of sub-regions in the array +* *sub-regions* is the array of Sub-Region IO FD info structures + +The reply message will additionally include at least one file descriptor i= n the +ancillary data. Note that more than one sub-region may share the same file +descriptor. + +Note that it is the client's responsibility to verify the requested values= (for +example, that the requested offset does not exceed the region's bounds). + +Each sub-region given in the response has one of two possible structures, +depending whether *type* is ``VFIO_USER_IO_FD_TYPE_IOEVENTFD`` or +``VFIO_USER_IO_FD_TYPE_IOREGIONFD``: + +Sub-Region IO FD info format (ioeventfd) +"""""""""""""""""""""""""""""""""""""""" + ++-----------+--------+------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D= =3D=3D+ +| offset | 0 | 8 | ++-----------+--------+------+ +| size | 8 | 8 | ++-----------+--------+------+ +| fd_index | 16 | 4 | ++-----------+--------+------+ +| type | 20 | 4 | ++-----------+--------+------+ +| flags | 24 | 4 | ++-----------+--------+------+ +| padding | 28 | 4 | ++-----------+--------+------+ +| datamatch | 32 | 8 | ++-----------+--------+------+ + +* *offset* is the offset of the start of the sub-region within the region + requested ("physical address offset" for the region) +* *size* is the length of the sub-region. This may be zero if the access s= ize is + not relevant, which may allow for optimizations +* *fd_index* is the index in the ancillary data of the FD to use for ioeve= ntfd + notification; it may be shared. +* *type* is ``VFIO_USER_IO_FD_TYPE_IOEVENTFD`` +* *flags* is any of: + + * ``KVM_IOEVENTFD_FLAG_DATAMATCH`` + * ``KVM_IOEVENTFD_FLAG_PIO`` + * ``KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY`` (FIXME: makes sense?) + +* *datamatch* is the datamatch value if needed + +See https://www.kernel.org/doc/Documentation/virtual/kvm/api.txt, *4.59 +KVM_IOEVENTFD* for further context on the ioeventfd-specific fields. + +Sub-Region IO FD info format (ioregionfd) +""""""""""""""""""""""""""""""""""""""""" + ++-----------+--------+------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D= =3D=3D+ +| offset | 0 | 8 | ++-----------+--------+------+ +| size | 8 | 8 | ++-----------+--------+------+ +| fd_index | 16 | 4 | ++-----------+--------+------+ +| type | 20 | 4 | ++-----------+--------+------+ +| flags | 24 | 4 | ++-----------+--------+------+ +| padding | 28 | 4 | ++-----------+--------+------+ +| user_data | 32 | 8 | ++-----------+--------+------+ + +* *offset* is the offset of the start of the sub-region within the region + requested ("physical address offset" for the region) +* *size* is the length of the sub-region. This may be zero if the access s= ize is + not relevant, which may allow for optimizations; ``KVM_IOREGION_POSTED_W= RITES`` + must be set in *flags* in this case +* *fd_index* is the index in the ancillary data of the FD to use for ioreg= ionfd + messages; it may be shared +* *type* is ``VFIO_USER_IO_FD_TYPE_IOREGIONFD`` +* *flags* is any of: + + * ``KVM_IOREGION_PIO`` + * ``KVM_IOREGION_POSTED_WRITES`` + +* *user_data* is an opaque value passed back to the server via a message o= n the + file descriptor + +For further information on the ioregionfd-specific fields, see: +https://lore.kernel.org/kvm/cover.1613828726.git.eafanasova@gmail.com/ + +(FIXME: update with final API docs.) + +``VFIO_USER_DEVICE_GET_IRQ_INFO`` +--------------------------------- + +This command message is sent by the client to the server to query for +information about device interrupt types. The VFIO IRQ info structure is +defined in ```` (``struct vfio_irq_info``). + +Request +^^^^^^^ + ++-------+--------+---------------------------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ +| argsz | 0 | 4 | ++-------+--------+---------------------------+ +| flags | 4 | 4 | ++-------+--------+---------------------------+ +| | +-----+--------------------------+ | +| | | Bit | Definition | | +| | +=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ | +| | | 0 | VFIO_IRQ_INFO_EVENTFD | | +| | +-----+--------------------------+ | +| | | 1 | VFIO_IRQ_INFO_MASKABLE | | +| | +-----+--------------------------+ | +| | | 2 | VFIO_IRQ_INFO_AUTOMASKED | | +| | +-----+--------------------------+ | +| | | 3 | VFIO_IRQ_INFO_NORESIZE | | +| | +-----+--------------------------+ | ++-------+--------+---------------------------+ +| index | 8 | 4 | ++-------+--------+---------------------------+ +| count | 12 | 4 | ++-------+--------+---------------------------+ + +* *argsz* is the maximum size of the reply payload (16 bytes today) +* index is the index of IRQ type being queried (e.g. ``VFIO_PCI_MSIX_IRQ_I= NDEX``) +* all other fields must be zero + +Reply +^^^^^ + ++-------+--------+---------------------------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ +| argsz | 0 | 4 | ++-------+--------+---------------------------+ +| flags | 4 | 4 | ++-------+--------+---------------------------+ +| | +-----+--------------------------+ | +| | | Bit | Definition | | +| | +=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ | +| | | 0 | VFIO_IRQ_INFO_EVENTFD | | +| | +-----+--------------------------+ | +| | | 1 | VFIO_IRQ_INFO_MASKABLE | | +| | +-----+--------------------------+ | +| | | 2 | VFIO_IRQ_INFO_AUTOMASKED | | +| | +-----+--------------------------+ | +| | | 3 | VFIO_IRQ_INFO_NORESIZE | | +| | +-----+--------------------------+ | ++-------+--------+---------------------------+ +| index | 8 | 4 | ++-------+--------+---------------------------+ +| count | 12 | 4 | ++-------+--------+---------------------------+ + +* *argsz* is the size required for the full reply payload (16 bytes today) +* *flags* defines IRQ attributes: + + * ``VFIO_IRQ_INFO_EVENTFD`` indicates the IRQ type can support server ev= entfd + signalling. + * ``VFIO_IRQ_INFO_MASKABLE`` indicates that the IRQ type supports the ``= MASK`` + and ``UNMASK`` actions in a ``VFIO_USER_DEVICE_SET_IRQS`` message. + * ``VFIO_IRQ_INFO_AUTOMASKED`` indicates the IRQ type masks itself after= being + triggered, and the client must send an ``UNMASK`` action to receive new + interrupts. + * ``VFIO_IRQ_INFO_NORESIZE`` indicates ``VFIO_USER_SET_IRQS`` operations= setup + interrupts as a set, and new sub-indexes cannot be enabled without dis= abling + the entire type. +* index is the index of IRQ type being queried +* count describes the number of interrupts of the queried type. + +``VFIO_USER_DEVICE_SET_IRQS`` +----------------------------- + +This command message is sent by the client to the server to set actions for +device interrupt types. The VFIO IRQ set structure is defined in +```` (``struct vfio_irq_set``). + +Request +^^^^^^^ + ++-------+--------+------------------------------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ +| argsz | 0 | 4 | ++-------+--------+------------------------------+ +| flags | 4 | 4 | ++-------+--------+------------------------------+ +| | +-----+-----------------------------+ | +| | | Bit | Definition | | +| | +=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D+ | +| | | 0 | VFIO_IRQ_SET_DATA_NONE | | +| | +-----+-----------------------------+ | +| | | 1 | VFIO_IRQ_SET_DATA_BOOL | | +| | +-----+-----------------------------+ | +| | | 2 | VFIO_IRQ_SET_DATA_EVENTFD | | +| | +-----+-----------------------------+ | +| | | 3 | VFIO_IRQ_SET_ACTION_MASK | | +| | +-----+-----------------------------+ | +| | | 4 | VFIO_IRQ_SET_ACTION_UNMASK | | +| | +-----+-----------------------------+ | +| | | 5 | VFIO_IRQ_SET_ACTION_TRIGGER | | +| | +-----+-----------------------------+ | ++-------+--------+------------------------------+ +| index | 8 | 4 | ++-------+--------+------------------------------+ +| start | 12 | 4 | ++-------+--------+------------------------------+ +| count | 16 | 4 | ++-------+--------+------------------------------+ +| data | 20 | variable | ++-------+--------+------------------------------+ + +* *argsz* is the size of the VFIO IRQ set request payload, including any *= data* + field. Note there is no reply payload, so this field differs from other + message types. +* *flags* defines the action performed on the interrupt range. The ``DATA`` + flags describe the data field sent in the message; the ``ACTION`` flags + describe the action to be performed. The flags are mutually exclusive for + both sets. + + * ``VFIO_IRQ_SET_DATA_NONE`` indicates there is no data field in the com= mand. + The action is performed unconditionally. + * ``VFIO_IRQ_SET_DATA_BOOL`` indicates the data field is an array of boo= lean + bytes. The action is performed if the corresponding boolean is true. + * ``VFIO_IRQ_SET_DATA_EVENTFD`` indicates an array of event file descrip= tors + was sent in the message meta-data. These descriptors will be signalled= when + the action defined by the action flags occurs. In ``AF_UNIX`` sockets,= the + descriptors are sent as ``SCM_RIGHTS`` type ancillary data. + If no file descriptors are provided, this de-assigns the specified + previously configured interrupts. + * ``VFIO_IRQ_SET_ACTION_MASK`` indicates a masking event. It can be used= with + ``VFIO_IRQ_SET_DATA_BOOL`` or ``VFIO_IRQ_SET_DATA_NONE`` to mask an in= terrupt, + or with ``VFIO_IRQ_SET_DATA_EVENTFD`` to generate an event when the gu= est masks + the interrupt. + * ``VFIO_IRQ_SET_ACTION_UNMASK`` indicates an unmasking event. It can be= used + with ``VFIO_IRQ_SET_DATA_BOOL`` or ``VFIO_IRQ_SET_DATA_NONE`` to unmas= k an + interrupt, or with ``VFIO_IRQ_SET_DATA_EVENTFD`` to generate an event = when the + guest unmasks the interrupt. + * ``VFIO_IRQ_SET_ACTION_TRIGGER`` indicates a triggering event. It can b= e used + with ``VFIO_IRQ_SET_DATA_BOOL`` or ``VFIO_IRQ_SET_DATA_NONE`` to trigg= er an + interrupt, or with ``VFIO_IRQ_SET_DATA_EVENTFD`` to generate an event = when the + server triggers the interrupt. + +* *index* is the index of IRQ type being setup. +* *start* is the start of the sub-index being set. +* *count* describes the number of sub-indexes being set. As a special case= , a + count (and start) of 0, with data flags of ``VFIO_IRQ_SET_DATA_NONE`` di= sables + all interrupts of the index. +* *data* is an optional field included when the + ``VFIO_IRQ_SET_DATA_BOOL`` flag is present. It contains an array of bool= eans + that specify whether the action is to be performed on the corresponding + index. It's used when the action is only performed on a subset of the ra= nge + specified. + +Not all interrupt types support every combination of data and action flags. +The client must know the capabilities of the device and IRQ index before it +sends a ``VFIO_USER_DEVICE_SET_IRQ`` message. + +In typical operation, a specific IRQ may operate as follows: + +1. The client sends a ``VFIO_USER_DEVICE_SET_IRQ`` message with + ``flags=3D(VFIO_IRQ_SET_DATA_EVENTFD|VFIO_IRQ_SET_ACTION_TRIGGER)`` alo= ng + with an eventfd. This associates the IRQ with a particular eventfd on t= he + server side. + +#. The client may send a ``VFIO_USER_DEVICE_SET_IRQ`` message with + ``flags=3D(VFIO_IRQ_SET_DATA_EVENTFD|VFIO_IRQ_SET_ACTION_MASK/UNMASK)``= along + with another eventfd. This associates the given eventfd with the + mask/unmask state on the server side. + +#. The server may trigger the IRQ by writing 1 to the eventfd. + +#. The server may mask/unmask an IRQ which will write 1 to the correspondi= ng + mask/unmask eventfd, if there is one. + +5. A client may trigger a device IRQ itself, by sending a + ``VFIO_USER_DEVICE_SET_IRQ`` message with + ``flags=3D(VFIO_IRQ_SET_DATA_NONE/BOOL|VFIO_IRQ_SET_ACTION_TRIGGER)``. + +6. A client may mask or unmask the IRQ, by sending a + ``VFIO_USER_DEVICE_SET_IRQ`` message with + ``flags=3D(VFIO_IRQ_SET_DATA_NONE/BOOL|VFIO_IRQ_SET_ACTION_MASK/UNMASK)= ``. + +Reply +^^^^^ + +There is no payload in the reply. + +.. _Read and Write Operations: + +Note that all of these operations must be supported by the client and/or s= erver, +even if the corresponding memory or device region has been shared as mappa= ble. + +The ``count`` field must not exceed the value of ``max_data_xfer_size`` of= the +peer, for both reads and writes. + +``VFIO_USER_REGION_READ`` +------------------------- + +If a device region is not mappable, it's not directly accessible by the cl= ient +via ``mmap()`` of the underlying file descriptor. In this case, a client c= an +read from a device region with this message. + +Request +^^^^^^^ + ++--------+--------+----------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D+ +| offset | 0 | 8 | ++--------+--------+----------+ +| region | 8 | 4 | ++--------+--------+----------+ +| count | 12 | 4 | ++--------+--------+----------+ + +* *offset* into the region being accessed. +* *region* is the index of the region being accessed. +* *count* is the size of the data to be transferred. + +Reply +^^^^^ + ++--------+--------+----------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D+ +| offset | 0 | 8 | ++--------+--------+----------+ +| region | 8 | 4 | ++--------+--------+----------+ +| count | 12 | 4 | ++--------+--------+----------+ +| data | 16 | variable | ++--------+--------+----------+ + +* *offset* into the region accessed. +* *region* is the index of the region accessed. +* *count* is the size of the data transferred. +* *data* is the data that was read from the device region. + +``VFIO_USER_REGION_WRITE`` +-------------------------- + +If a device region is not mappable, it's not directly accessible by the cl= ient +via mmap() of the underlying fd. In this case, a client can write to a dev= ice +region with this message. + +Request +^^^^^^^ + ++--------+--------+----------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D+ +| offset | 0 | 8 | ++--------+--------+----------+ +| region | 8 | 4 | ++--------+--------+----------+ +| count | 12 | 4 | ++--------+--------+----------+ +| data | 16 | variable | ++--------+--------+----------+ + +* *offset* into the region being accessed. +* *region* is the index of the region being accessed. +* *count* is the size of the data to be transferred. +* *data* is the data to write + +Reply +^^^^^ + ++--------+--------+----------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D+ +| offset | 0 | 8 | ++--------+--------+----------+ +| region | 8 | 4 | ++--------+--------+----------+ +| count | 12 | 4 | ++--------+--------+----------+ + +* *offset* into the region accessed. +* *region* is the index of the region accessed. +* *count* is the size of the data transferred. + +``VFIO_USER_DMA_READ`` +----------------------- + +If the client has not shared mappable memory, the server can use this mess= age to +read from guest memory. + +Request +^^^^^^^ + ++---------+--------+----------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D+ +| address | 0 | 8 | ++---------+--------+----------+ +| count | 8 | 8 | ++---------+--------+----------+ + +* *address* is the client DMA memory address being accessed. This address = must have + been previously exported to the server with a ``VFIO_USER_DMA_MAP`` mess= age. +* *count* is the size of the data to be transferred. + +Reply +^^^^^ + ++---------+--------+----------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D+ +| address | 0 | 8 | ++---------+--------+----------+ +| count | 8 | 8 | ++---------+--------+----------+ +| data | 16 | variable | ++---------+--------+----------+ + +* *address* is the client DMA memory address being accessed. +* *count* is the size of the data transferred. +* *data* is the data read. + +``VFIO_USER_DMA_WRITE`` +----------------------- + +If the client has not shared mappable memory, the server can use this mess= age to +write to guest memory. + +Request +^^^^^^^ + ++---------+--------+----------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D+ +| address | 0 | 8 | ++---------+--------+----------+ +| count | 8 | 8 | ++---------+--------+----------+ +| data | 16 | variable | ++---------+--------+----------+ + +* *address* is the client DMA memory address being accessed. This address = must have + been previously exported to the server with a ``VFIO_USER_DMA_MAP`` mess= age. +* *count* is the size of the data to be transferred. +* *data* is the data to write + +Reply +^^^^^ + ++---------+--------+----------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D+ +| address | 0 | 8 | ++---------+--------+----------+ +| count | 8 | 4 | ++---------+--------+----------+ + +* *address* is the client DMA memory address being accessed. +* *count* is the size of the data transferred. + +``VFIO_USER_DEVICE_RESET`` +-------------------------- + +This command message is sent from the client to the server to reset the de= vice. +Neither the request or reply have a payload. + +``VFIO_USER_REGION_WRITE_MULTI`` +-------------------------------- + +This message can be used to coalesce multiple device write operations +into a single messgage. It is only used as an optimization when the +outgoing message queue is relatively full. + +Request +^^^^^^^ + ++---------+--------+----------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D+ +| wr_cnt | 0 | 8 | ++---------+--------+----------+ +| wrs | 8 | variable | ++---------+--------+----------+ + +* *wr_cnt* is the number of device writes coalesced in the message +* *wrs* is an array of device writes defined below + +Single Device Write Format +"""""""""""""""""""""""""" + ++--------+--------+----------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D+ +| offset | 0 | 8 | ++--------+--------+----------+ +| region | 8 | 4 | ++--------+--------+----------+ +| count | 12 | 4 | ++--------+--------+----------+ +| data | 16 | 8 | ++--------+--------+----------+ + +* *offset* into the region being accessed. +* *region* is the index of the region being accessed. +* *count* is the size of the data to be transferred. This format can + only describe writes of 8 bytes or less. +* *data* is the data to write. + +Reply +^^^^^ + ++---------+--------+----------+ +| Name | Offset | Size | ++=3D=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D=3D=3D+=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D+ +| wr_cnt | 0 | 8 | ++---------+--------+----------+ + +* *wr_cnt* is the number of device writes completed. + + +Appendices +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +Unused VFIO ``ioctl()`` commands +-------------------------------- + +The following VFIO commands do not have an equivalent vfio-user command: + +* ``VFIO_GET_API_VERSION`` +* ``VFIO_CHECK_EXTENSION`` +* ``VFIO_SET_IOMMU`` +* ``VFIO_GROUP_GET_STATUS`` +* ``VFIO_GROUP_SET_CONTAINER`` +* ``VFIO_GROUP_UNSET_CONTAINER`` +* ``VFIO_GROUP_GET_DEVICE_FD`` +* ``VFIO_IOMMU_GET_INFO`` + +However, once support for live migration for VFIO devices is finalized some +of the above commands may have to be handled by the client in their +corresponding vfio-user form. This will be addressed in a future protocol +version. + +VFIO groups and containers +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The current VFIO implementation includes group and container idioms that +describe how a device relates to the host IOMMU. In the vfio-user +implementation, the IOMMU is implemented in SW by the client, and is not +visible to the server. The simplest idea would be that the client put each +device into its own group and container. + +Backend Program Conventions +--------------------------- + +vfio-user backend program conventions are based on the vhost-user ones. + +* The backend program must not daemonize itself. +* No assumptions must be made as to what access the backend program has on= the + system. +* File descriptors 0, 1 and 2 must exist, must have regular + stdin/stdout/stderr semantics, and can be redirected. +* The backend program must honor the SIGTERM signal. +* The backend program must accept the following commands line options: + + * ``--socket-path=3DPATH``: path to UNIX domain socket, + * ``--fd=3DFDNUM``: file descriptor for UNIX domain socket, incompatible= with + ``--socket-path`` +* The backend program must be accompanied with a JSON file stored under + ``/usr/share/vfio-user``. + +TODO add schema similar to docs/interop/vhost-user.json. diff --git a/MAINTAINERS b/MAINTAINERS index 738c4eb..999340d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1984,6 +1984,12 @@ F: hw/vfio/ap.c F: docs/system/s390x/vfio-ap.rst L: qemu-s390x@nongnu.org =20 +vfio-user +M: John G Johnson +M: Thanos Makatos +S: Supported +F: docs/devel/vfio-user.rst + vhost M: Michael S. Tsirkin S: Supported --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675316955; cv=none; d=zohomail.com; s=zohoarc; b=AT5aV0GD6mKuK9H8ATf1OqlEg2AZC38wzQnwOVfnQCzt6kd6005vlHtLKstflRyrUzeqqwSbAfu6eiHuzAcsFdkIMGdUX8vLphH5pldaoBc+87nnLjs75/xO0y3puko5VsVxvRdi024Gr4wwSm6Hk7Y0rPce+iVJ1mm9uhHJzFc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675316955; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=hQdDrXesDc+e7ZPl0QeAWbPDNFhn5J9buEdtGCyORsk=; b=oEqLPmYUTVPZYXJNBU9qT0Dp0dYLpKGdcx/26mw1MAP+h66OqRzOoQsFpS3kUfQe83FPsiXOtnT/vD+fu3AzOD4KGwAbtO2WlCs92ZpOzKDeddVY+56O0Ie8oHxB3B9SKXO+takTDFEiwmZ3bkdin9+IMsnhe/uNc3ologt1zKI= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1675316955363878.2650419711291; Wed, 1 Feb 2023 21:49:15 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSP8-00086P-Uh; Thu, 02 Feb 2023 00:45:02 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSP7-00085G-0Z for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:01 -0500 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSP3-0006x6-LG for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:00 -0500 Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124i2k1029277; Thu, 2 Feb 2023 05:44:56 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfmbg2bh8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:44:56 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254PpU013015; Thu, 2 Feb 2023 05:44:55 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:44:55 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=hQdDrXesDc+e7ZPl0QeAWbPDNFhn5J9buEdtGCyORsk=; b=AG4Ni7CEthJyfI8bWLU8shlJxSwudqeAq9H9k3zkGdX5KX3UhUZFa69zLY7u6wRSz56K YhAVjLkxPPwPw+cl1KjFWWd0yGCNBRfrtq8iXv7UN7v+fXpPzgsSuSjkn96a85TPSR+0 7BLrbLHILiv+nF7dISJTSQ+gU7yJU8acsoS2REDEh2wD0zspM0VJWHWNmCgzViSeFRYF AInz7kXGXfivcYZk6RCjW5aHYkyNHHDiDxNndOkOPeRKGB0J8GL80o54JCpyKwUdOBfH GPGmoFQquw7TMol5xAsxYxwPT1B7phUztEkEMfQnyJ7fV4pRBCTbFMR9G8EZ5pF4GGiK 4g== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 02/23] vfio-user: add VFIO base abstract class Date: Wed, 1 Feb 2023 21:55:38 -0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-ORIG-GUID: FopPxtlZcfdZ78UIQN1ew5GTxIxXzLYj X-Proofpoint-GUID: FopPxtlZcfdZ78UIQN1ew5GTxIxXzLYj Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.177.32; envelope-from=john.g.johnson@oracle.com; helo=mx0b-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675316956159100001 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Add an abstract base class both the kernel driver and user socket implementations can use to share code. Signed-off-by: John G Johnson Signed-off-by: Elena Ufimtseva Signed-off-by: Jagannathan Raman --- hw/vfio/pci.h | 16 +++++++-- hw/vfio/pci.c | 106 +++++++++++++++++++++++++++++++++++-------------------= ---- 2 files changed, 78 insertions(+), 44 deletions(-) diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 7c236a5..7fb656c 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -115,8 +115,13 @@ typedef struct VFIOMSIXInfo { unsigned long *pending; } VFIOMSIXInfo; =20 -#define TYPE_VFIO_PCI "vfio-pci" -OBJECT_DECLARE_SIMPLE_TYPE(VFIOPCIDevice, VFIO_PCI) +/* + * TYPE_VFIO_PCI_BASE is an abstract type used to share code + * between VFIO implementations that use a kernel driver + * with those that use user sockets. + */ +#define TYPE_VFIO_PCI_BASE "vfio-pci-base" +OBJECT_DECLARE_SIMPLE_TYPE(VFIOPCIDevice, VFIO_PCI_BASE) =20 struct VFIOPCIDevice { PCIDevice pdev; @@ -177,6 +182,13 @@ struct VFIOPCIDevice { Notifier irqchip_change_notifier; }; =20 +#define TYPE_VFIO_PCI "vfio-pci" +OBJECT_DECLARE_SIMPLE_TYPE(VFIOKernelPCIDevice, VFIO_PCI) + +struct VFIOKernelPCIDevice { + VFIOPCIDevice device; +}; + /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match = hw */ static inline bool vfio_pci_is(VFIOPCIDevice *vdev, uint32_t vendor, uint3= 2_t device) { diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 939dcc3..9d70114 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -235,7 +235,7 @@ static void vfio_intx_update(VFIOPCIDevice *vdev, PCIIN= TxRoute *route) =20 static void vfio_intx_routing_notifier(PCIDevice *pdev) { - VFIOPCIDevice *vdev =3D VFIO_PCI(pdev); + VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(pdev); PCIINTxRoute route; =20 if (vdev->interrupt !=3D VFIO_INT_INTx) { @@ -467,7 +467,7 @@ static void vfio_update_kvm_msi_virq(VFIOMSIVector *vec= tor, MSIMessage msg, static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr, MSIMessage *msg, IOHandler *handler) { - VFIOPCIDevice *vdev =3D VFIO_PCI(pdev); + VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(pdev); VFIOMSIVector *vector; int ret; =20 @@ -561,7 +561,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev, =20 static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr) { - VFIOPCIDevice *vdev =3D VFIO_PCI(pdev); + VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(pdev); VFIOMSIVector *vector =3D &vdev->msi_vectors[nr]; =20 trace_vfio_msix_vector_release(vdev->vbasedev.name, nr); @@ -1109,7 +1109,7 @@ static const MemoryRegionOps vfio_vga_ops =3D { */ static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar) { - VFIOPCIDevice *vdev =3D VFIO_PCI(pdev); + VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(pdev); VFIORegion *region =3D &vdev->bars[bar].region; MemoryRegion *mmap_mr, *region_mr, *base_mr; PCIIORegion *r; @@ -1155,7 +1155,7 @@ static void vfio_sub_page_bar_update_mapping(PCIDevic= e *pdev, int bar) */ uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len) { - VFIOPCIDevice *vdev =3D VFIO_PCI(pdev); + VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(pdev); uint32_t emu_bits =3D 0, emu_val =3D 0, phys_val =3D 0, val; =20 memcpy(&emu_bits, vdev->emulated_config_bits + addr, len); @@ -1188,7 +1188,7 @@ uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32= _t addr, int len) void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr, uint32_t val, int len) { - VFIOPCIDevice *vdev =3D VFIO_PCI(pdev); + VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(pdev); uint32_t val_le =3D cpu_to_le32(val); =20 trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len); @@ -2845,7 +2845,7 @@ static void vfio_unregister_req_notifier(VFIOPCIDevic= e *vdev) =20 static void vfio_realize(PCIDevice *pdev, Error **errp) { - VFIOPCIDevice *vdev =3D VFIO_PCI(pdev); + VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(pdev); VFIODevice *vbasedev =3D &vdev->vbasedev; VFIODevice *vbasedev_iter; VFIOGroup *group; @@ -3169,7 +3169,7 @@ error: =20 static void vfio_instance_finalize(Object *obj) { - VFIOPCIDevice *vdev =3D VFIO_PCI(obj); + VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(obj); VFIOGroup *group =3D vdev->vbasedev.group; =20 vfio_display_finalize(vdev); @@ -3189,7 +3189,7 @@ static void vfio_instance_finalize(Object *obj) =20 static void vfio_exitfn(PCIDevice *pdev) { - VFIOPCIDevice *vdev =3D VFIO_PCI(pdev); + VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(pdev); =20 vfio_unregister_req_notifier(vdev); vfio_unregister_err_notifier(vdev); @@ -3208,7 +3208,7 @@ static void vfio_exitfn(PCIDevice *pdev) =20 static void vfio_pci_reset(DeviceState *dev) { - VFIOPCIDevice *vdev =3D VFIO_PCI(dev); + VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(dev); =20 trace_vfio_pci_reset(vdev->vbasedev.name); =20 @@ -3248,7 +3248,7 @@ post_reset: static void vfio_instance_init(Object *obj) { PCIDevice *pci_dev =3D PCI_DEVICE(obj); - VFIOPCIDevice *vdev =3D VFIO_PCI(obj); + VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(obj); =20 device_add_bootindex_property(obj, &vdev->bootindex, "bootindex", NULL, @@ -3265,24 +3265,12 @@ static void vfio_instance_init(Object *obj) pci_dev->cap_present |=3D QEMU_PCI_CAP_EXPRESS; } =20 -static Property vfio_pci_dev_properties[] =3D { - DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIOPCIDevice, host), - DEFINE_PROP_STRING("sysfsdev", VFIOPCIDevice, vbasedev.sysfsdev), +static Property vfio_pci_base_dev_properties[] =3D { DEFINE_PROP_ON_OFF_AUTO("x-pre-copy-dirty-page-tracking", VFIOPCIDevic= e, vbasedev.pre_copy_dirty_page_tracking, ON_OFF_AUTO_ON), - DEFINE_PROP_ON_OFF_AUTO("display", VFIOPCIDevice, - display, ON_OFF_AUTO_OFF), - DEFINE_PROP_UINT32("xres", VFIOPCIDevice, display_xres, 0), - DEFINE_PROP_UINT32("yres", VFIOPCIDevice, display_yres, 0), DEFINE_PROP_UINT32("x-intx-mmap-timeout-ms", VFIOPCIDevice, intx.mmap_timeout, 1100), - DEFINE_PROP_BIT("x-vga", VFIOPCIDevice, features, - VFIO_FEATURE_ENABLE_VGA_BIT, false), - DEFINE_PROP_BIT("x-req", VFIOPCIDevice, features, - VFIO_FEATURE_ENABLE_REQ_BIT, true), - DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features, - VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false), DEFINE_PROP_BOOL("x-enable-migration", VFIOPCIDevice, vbasedev.enable_migration, false), DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false), @@ -3291,8 +3279,6 @@ static Property vfio_pci_dev_properties[] =3D { DEFINE_PROP_BOOL("x-no-kvm-intx", VFIOPCIDevice, no_kvm_intx, false), DEFINE_PROP_BOOL("x-no-kvm-msi", VFIOPCIDevice, no_kvm_msi, false), DEFINE_PROP_BOOL("x-no-kvm-msix", VFIOPCIDevice, no_kvm_msix, false), - DEFINE_PROP_BOOL("x-no-geforce-quirks", VFIOPCIDevice, - no_geforce_quirks, false), DEFINE_PROP_BOOL("x-no-kvm-ioeventfd", VFIOPCIDevice, no_kvm_ioeventfd, false), DEFINE_PROP_BOOL("x-no-vfio-ioeventfd", VFIOPCIDevice, no_vfio_ioevent= fd, @@ -3303,10 +3289,6 @@ static Property vfio_pci_dev_properties[] =3D { sub_vendor_id, PCI_ANY_ID), DEFINE_PROP_UINT32("x-pci-sub-device-id", VFIOPCIDevice, sub_device_id, PCI_ANY_ID), - DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0), - DEFINE_PROP_UNSIGNED_NODEFAULT("x-nv-gpudirect-clique", VFIOPCIDevice, - nv_gpudirect_clique, - qdev_prop_nv_gpudirect_clique, uint8_t), DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_r= elo, OFF_AUTOPCIBAR_OFF), /* @@ -3317,28 +3299,25 @@ static Property vfio_pci_dev_properties[] =3D { DEFINE_PROP_END_OF_LIST(), }; =20 -static void vfio_pci_dev_class_init(ObjectClass *klass, void *data) +static void vfio_pci_base_dev_class_init(ObjectClass *klass, void *data) { DeviceClass *dc =3D DEVICE_CLASS(klass); PCIDeviceClass *pdc =3D PCI_DEVICE_CLASS(klass); =20 - dc->reset =3D vfio_pci_reset; - device_class_set_props(dc, vfio_pci_dev_properties); - dc->desc =3D "VFIO-based PCI device assignment"; + device_class_set_props(dc, vfio_pci_base_dev_properties); + dc->desc =3D "VFIO PCI base device"; set_bit(DEVICE_CATEGORY_MISC, dc->categories); - pdc->realize =3D vfio_realize; pdc->exit =3D vfio_exitfn; pdc->config_read =3D vfio_pci_read_config; pdc->config_write =3D vfio_pci_write_config; } =20 -static const TypeInfo vfio_pci_dev_info =3D { - .name =3D TYPE_VFIO_PCI, +static const TypeInfo vfio_pci_base_dev_info =3D { + .name =3D TYPE_VFIO_PCI_BASE, .parent =3D TYPE_PCI_DEVICE, - .instance_size =3D sizeof(VFIOPCIDevice), - .class_init =3D vfio_pci_dev_class_init, - .instance_init =3D vfio_instance_init, - .instance_finalize =3D vfio_instance_finalize, + .instance_size =3D 0, + .abstract =3D true, + .class_init =3D vfio_pci_base_dev_class_init, .interfaces =3D (InterfaceInfo[]) { { INTERFACE_PCIE_DEVICE }, { INTERFACE_CONVENTIONAL_PCI_DEVICE }, @@ -3346,6 +3325,48 @@ static const TypeInfo vfio_pci_dev_info =3D { }, }; =20 +static Property vfio_pci_dev_properties[] =3D { + DEFINE_PROP_PCI_HOST_DEVADDR("host", VFIOPCIDevice, host), + DEFINE_PROP_STRING("sysfsdev", VFIOPCIDevice, vbasedev.sysfsdev), + DEFINE_PROP_ON_OFF_AUTO("display", VFIOPCIDevice, + display, ON_OFF_AUTO_OFF), + DEFINE_PROP_UINT32("xres", VFIOPCIDevice, display_xres, 0), + DEFINE_PROP_UINT32("yres", VFIOPCIDevice, display_yres, 0), + DEFINE_PROP_BIT("x-vga", VFIOPCIDevice, features, + VFIO_FEATURE_ENABLE_VGA_BIT, false), + DEFINE_PROP_BIT("x-req", VFIOPCIDevice, features, + VFIO_FEATURE_ENABLE_REQ_BIT, true), + DEFINE_PROP_BIT("x-igd-opregion", VFIOPCIDevice, features, + VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false), + DEFINE_PROP_BOOL("x-no-geforce-quirks", VFIOPCIDevice, + no_geforce_quirks, false), + DEFINE_PROP_UINT32("x-igd-gms", VFIOPCIDevice, igd_gms, 0), + DEFINE_PROP_UNSIGNED_NODEFAULT("x-nv-gpudirect-clique", VFIOPCIDevice, + nv_gpudirect_clique, + qdev_prop_nv_gpudirect_clique, uint8_t), + DEFINE_PROP_END_OF_LIST(), +}; + +static void vfio_pci_dev_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc =3D DEVICE_CLASS(klass); + PCIDeviceClass *pdc =3D PCI_DEVICE_CLASS(klass); + + dc->reset =3D vfio_pci_reset; + device_class_set_props(dc, vfio_pci_dev_properties); + dc->desc =3D "VFIO-based PCI device assignment"; + pdc->realize =3D vfio_realize; +} + +static const TypeInfo vfio_pci_dev_info =3D { + .name =3D TYPE_VFIO_PCI, + .parent =3D TYPE_VFIO_PCI_BASE, + .instance_size =3D sizeof(VFIOKernelPCIDevice), + .class_init =3D vfio_pci_dev_class_init, + .instance_init =3D vfio_instance_init, + .instance_finalize =3D vfio_instance_finalize, +}; + static Property vfio_pci_dev_nohotplug_properties[] =3D { DEFINE_PROP_BOOL("ramfb", VFIOPCIDevice, enable_ramfb, false), DEFINE_PROP_END_OF_LIST(), @@ -3362,12 +3383,13 @@ static void vfio_pci_nohotplug_dev_class_init(Objec= tClass *klass, void *data) static const TypeInfo vfio_pci_nohotplug_dev_info =3D { .name =3D TYPE_VFIO_PCI_NOHOTPLUG, .parent =3D TYPE_VFIO_PCI, - .instance_size =3D sizeof(VFIOPCIDevice), + .instance_size =3D sizeof(VFIOKernelPCIDevice), .class_init =3D vfio_pci_nohotplug_dev_class_init, }; =20 static void register_vfio_pci_dev_type(void) { + type_register_static(&vfio_pci_base_dev_info); type_register_static(&vfio_pci_dev_info); type_register_static(&vfio_pci_nohotplug_dev_info); } --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675316731; cv=none; d=zohomail.com; s=zohoarc; b=hKp3kYPlMzcbRJuUivt63n55co35hWv2MydD7GPHVYRbrIC+jxMDvgMjXt+frEpwYH3DJPgKqBEFic685rzTw822Ek+xN4mOspIIQfW3xjyD12YwVBbCkKwVgL+f/JbYUeqds+DKLU9g0y4/2SmArUmJA+9ngm1CXrOUJ1covb4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675316731; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=uc5HRfhmqLPvzrmMuJIXi7kGF4TfvorXX6+xKwUmqBw=; b=k/MWz1b63UX2yKR8bBYFGvesw5j3pEE7CEgjfjlIZgJ+uv6weEhV6WUY4LN2eH0cpd20JKpCHytixSJkVbHOpAeg++slZvNpZGBGf6cKsOvaYvwxGp+kQVOsSoBd/w8/RMRoLzkMJzOapyGRMaSvvjwWRZaLkwwk2+oBv7irc8k= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 167531673113788.38653704325384; Wed, 1 Feb 2023 21:45:31 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSP8-000860-8n; Thu, 02 Feb 2023 00:45:02 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSP6-00085D-UD for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:00 -0500 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSP4-0006xD-Ec for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:00 -0500 Received: from pps.filterd (m0333520.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124i2qw012886; Thu, 2 Feb 2023 05:44:57 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfq4hj0c8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:44:57 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254PpV013015; Thu, 2 Feb 2023 05:44:56 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:44:56 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=uc5HRfhmqLPvzrmMuJIXi7kGF4TfvorXX6+xKwUmqBw=; b=JB8rjtOzvJ2U+29Bk17zZyFVyHmYuL9G1I70VU14J2BqZHc3Li0kV/UrMqTqVAGQzIWt g14ac99ls9Ld48fIUSD2stvCpJnpQInIxZQ0OPJULTVup3oozXeiv1MkjcPIZOQhJgth +HINwncLgTjPvNWV9IypPRPd1hjInFiSXo1/n2/sx+eaB/0HoEs5Jtim1SD6AyoXbp52 YVlT0fszMyspZsQROAiaYND+C4U1/ZFqzU81OeHsLcWthCEXxcUHGp6y8joX2LgCb+yI 8Etls+0jj9N9lpnw/BpbhA7V2BPy6jJ8Z2J/Tc3qCj6M5/8yXue2tF8/UUL+Bm0u7j3E 6A== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 03/23] vfio-user: add container IO ops vector Date: Wed, 1 Feb 2023 21:55:39 -0800 Message-Id: <3648002c52cef9b4473f97d18cb7e2cd62fc3fd5.1675228037.git.john.g.johnson@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-GUID: Ds6gQgb9CbRguDp12JqFCvHC183ZDmhB X-Proofpoint-ORIG-GUID: Ds6gQgb9CbRguDp12JqFCvHC183ZDmhB Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.177.32; envelope-from=john.g.johnson@oracle.com; helo=mx0b-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675316733237100003 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Used for communication with VFIO driver (prep work for vfio-user, which will communicate over a socket) Signed-off-by: John G Johnson --- include/hw/vfio/vfio-common.h | 24 ++++++++ hw/vfio/common.c | 128 ++++++++++++++++++++++++++++----------= ---- 2 files changed, 110 insertions(+), 42 deletions(-) diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index e573f5a..953bc0f 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -75,6 +75,7 @@ typedef struct VFIOAddressSpace { } VFIOAddressSpace; =20 struct VFIOGroup; +typedef struct VFIOContainerIO VFIOContainerIO; =20 typedef struct VFIOContainer { VFIOAddressSpace *space; @@ -83,6 +84,7 @@ typedef struct VFIOContainer { MemoryListener prereg_listener; unsigned iommu_type; Error *error; + VFIOContainerIO *io; bool initialized; bool dirty_pages_supported; uint64_t dirty_pgsizes; @@ -154,6 +156,28 @@ struct VFIODeviceOps { int (*vfio_load_config)(VFIODevice *vdev, QEMUFile *f); }; =20 +#ifdef CONFIG_LINUX + +/* + * The next 2 ops vectors are how Devices and Containers + * communicate with the server. The default option is + * through ioctl() to the kernel VFIO driver, but vfio-user + * can use a socket to a remote process. + */ + +struct VFIOContainerIO { + int (*dma_map)(VFIOContainer *container, + struct vfio_iommu_type1_dma_map *map); + int (*dma_unmap)(VFIOContainer *container, + struct vfio_iommu_type1_dma_unmap *unmap, + struct vfio_bitmap *bitmap); + int (*dirty_bitmap)(VFIOContainer *container, + struct vfio_iommu_type1_dirty_bitmap *bitmap, + struct vfio_iommu_type1_dirty_bitmap_get *range); +}; + +#endif /* CONFIG_LINUX */ + typedef struct VFIOGroup { int fd; int groupid; diff --git a/hw/vfio/common.c b/hw/vfio/common.c index ace9562..9310a7f 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -58,6 +58,8 @@ static QLIST_HEAD(, VFIOAddressSpace) vfio_address_spaces= =3D static int vfio_kvm_device_fd =3D -1; #endif =20 +static VFIOContainerIO vfio_cont_io_ioctl; + /* * Common VFIO interrupt disable */ @@ -432,12 +434,12 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *conta= iner, goto unmap_exit; } =20 - ret =3D ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, unmap); + ret =3D container->io->dma_unmap(container, unmap, bitmap); if (!ret) { cpu_physical_memory_set_dirty_lebitmap((unsigned long *)bitmap->da= ta, iotlb->translated_addr, pages); } else { - error_report("VFIO_UNMAP_DMA with DIRTY_BITMAP : %m"); + error_report("VFIO_UNMAP_DMA with DIRTY_BITMAP : %s", strerror(-re= t)); } =20 g_free(bitmap->data); @@ -465,30 +467,7 @@ static int vfio_dma_unmap(VFIOContainer *container, return vfio_dma_unmap_bitmap(container, iova, size, iotlb); } =20 - while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) { - /* - * The type1 backend has an off-by-one bug in the kernel (71a7d3d7= 8e3c - * v4.15) where an overflow in its wrap-around check prevents us f= rom - * unmapping the last page of the address space. Test for the err= or - * condition and re-try the unmap excluding the last page. The - * expectation is that we've never mapped the last page anyway and= this - * unmap request comes via vIOMMU support which also makes it unli= kely - * that this page is used. This bug was introduced well after typ= e1 v2 - * support was introduced, so we shouldn't need to test for v1. A= fix - * is queued for kernel v5.0 so this workaround can be removed once - * affected kernels are sufficiently deprecated. - */ - if (errno =3D=3D EINVAL && unmap.size && !(unmap.iova + unmap.size= ) && - container->iommu_type =3D=3D VFIO_TYPE1v2_IOMMU) { - trace_vfio_dma_unmap_overflow_workaround(); - unmap.size -=3D 1ULL << ctz64(container->pgsizes); - continue; - } - error_report("VFIO_UNMAP_DMA failed: %s", strerror(errno)); - return -errno; - } - - return 0; + return container->io->dma_unmap(container, &unmap, NULL); } =20 static int vfio_dma_map(VFIOContainer *container, hwaddr iova, @@ -501,24 +480,18 @@ static int vfio_dma_map(VFIOContainer *container, hwa= ddr iova, .iova =3D iova, .size =3D size, }; + int ret; =20 if (!readonly) { map.flags |=3D VFIO_DMA_MAP_FLAG_WRITE; } =20 - /* - * Try the mapping, if it fails with EBUSY, unmap the region and try - * again. This shouldn't be necessary, but we sometimes see it in - * the VGA ROM space. - */ - if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) =3D=3D 0 || - (errno =3D=3D EBUSY && vfio_dma_unmap(container, iova, size, NULL)= =3D=3D 0 && - ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) =3D=3D 0)) { - return 0; - } + ret =3D container->io->dma_map(container, &map); =20 - error_report("VFIO_MAP_DMA failed: %s", strerror(errno)); - return -errno; + if (ret < 0) { + error_report("VFIO_MAP_DMA failed: %s", strerror(-ret)); + } + return ret; } =20 static void vfio_host_win_add(VFIOContainer *container, @@ -1263,10 +1236,10 @@ static void vfio_set_dirty_page_tracking(VFIOContai= ner *container, bool start) dirty.flags =3D VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP; } =20 - ret =3D ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, &dirty); + ret =3D container->io->dirty_bitmap(container, &dirty, NULL); if (ret) { error_report("Failed to set dirty tracking flag 0x%x errno: %d", - dirty.flags, errno); + dirty.flags, -ret); } } =20 @@ -1316,11 +1289,11 @@ static int vfio_get_dirty_bitmap(VFIOContainer *con= tainer, uint64_t iova, goto err_out; } =20 - ret =3D ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, dbitmap); + ret =3D container->io->dirty_bitmap(container, dbitmap, range); if (ret) { error_report("Failed to get dirty bitmap for iova: 0x%"PRIx64 " size: 0x%"PRIx64" err: %d", (uint64_t)range->iova, - (uint64_t)range->size, errno); + (uint64_t)range->size, -ret); goto err_out; } =20 @@ -2090,6 +2063,7 @@ static int vfio_connect_container(VFIOGroup *group, A= ddressSpace *as, container->error =3D NULL; container->dirty_pages_supported =3D false; container->dma_max_mappings =3D 0; + container->io =3D &vfio_cont_io_ioctl; QLIST_INIT(&container->giommu_list); QLIST_INIT(&container->hostwin_list); QLIST_INIT(&container->vrdl_list); @@ -2626,3 +2600,73 @@ int vfio_eeh_as_op(AddressSpace *as, uint32_t op) } return vfio_eeh_container_op(container, op); } + +/* + * Traditional ioctl() based io + */ + +static int vfio_io_dma_map(VFIOContainer *container, + struct vfio_iommu_type1_dma_map *map) +{ + + /* + * Try the mapping, if it fails with EBUSY, unmap the region and try + * again. This shouldn't be necessary, but we sometimes see it in + * the VGA ROM space. + */ + if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, map) =3D=3D 0 || + (errno =3D=3D EBUSY && + vfio_dma_unmap(container, map->iova, map->size, NULL) =3D=3D 0 && + ioctl(container->fd, VFIO_IOMMU_MAP_DMA, map) =3D=3D 0)) { + return 0; + } + return -errno; +} + +static int vfio_io_dma_unmap(VFIOContainer *container, + struct vfio_iommu_type1_dma_unmap *unmap, + struct vfio_bitmap *bitmap) +{ + + while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, unmap)) { + /* + * The type1 backend has an off-by-one bug in the kernel (71a7d3d7= 8e3c + * v4.15) where an overflow in its wrap-around check prevents us f= rom + * unmapping the last page of the address space. Test for the err= or + * condition and re-try the unmap excluding the last page. The + * expectation is that we've never mapped the last page anyway and= this + * unmap request comes via vIOMMU support which also makes it unli= kely + * that this page is used. This bug was introduced well after typ= e1 v2 + * support was introduced, so we shouldn't need to test for v1. A= fix + * is queued for kernel v5.0 so this workaround can be removed once + * affected kernels are sufficiently deprecated. + */ + if (errno =3D=3D EINVAL && unmap->size && !(unmap->iova + unmap->s= ize) && + container->iommu_type =3D=3D VFIO_TYPE1v2_IOMMU) { + trace_vfio_dma_unmap_overflow_workaround(); + unmap->size -=3D 1ULL << ctz64(container->pgsizes); + continue; + } + error_report("VFIO_UNMAP_DMA failed: %s", strerror(errno)); + return -errno; + } + + return 0; +} + +static int vfio_io_dirty_bitmap(VFIOContainer *container, + struct vfio_iommu_type1_dirty_bitmap *bitm= ap, + struct vfio_iommu_type1_dirty_bitmap_get *= range) +{ + int ret; + + ret =3D ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, bitmap); + + return ret < 0 ? -errno : ret; +} + +static VFIOContainerIO vfio_cont_io_ioctl =3D { + .dma_map =3D vfio_io_dma_map, + .dma_unmap =3D vfio_io_dma_unmap, + .dirty_bitmap =3D vfio_io_dirty_bitmap, +}; --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675316965; cv=none; d=zohomail.com; s=zohoarc; b=i4MwBAsJuS58PumaDu94pSS8l/eoPgr1RinXpq9mcKeY+ToJwLRvCEICWpUH3J9edMlrpxNsxw+Kk8qZJGBqrjnFyk9h45xFFZdgZWgz7GlTHgjA4EETC4hjTy/M1w1T3XHEIo1DgAZCC/DPUrRB2rsNPHQ0UMo4e1biOswTDPo= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675316965; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=AE0X46sHYPRDnHkvwEFACXiZolUwmlU3ghHJd32havE=; b=EJb8/0gv6NHYiO9Y0nE2MToxVbXH9dlb2bBvG6fbIu17CPf7NsdRdPtSGzHSKtBRh65ZS+G5nadc0b+Hmtf7f/iTYLMyMnMiGPncMthFHpxqZoC+7pSFYjTc2Bkyp8i+igs7uCXPxtsEgcscfmFsXSLicxj7Iu8BuYiE1cxdaTI= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1675316965813496.92296973375346; Wed, 1 Feb 2023 21:49:25 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSP8-000866-Gj; Thu, 02 Feb 2023 00:45:02 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSP7-00085p-L6 for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:01 -0500 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSP5-0006xN-Ed for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:01 -0500 Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124jErE009389; Thu, 2 Feb 2023 05:44:58 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfn9yj84a-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:44:58 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254PpW013015; Thu, 2 Feb 2023 05:44:57 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:44:57 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=AE0X46sHYPRDnHkvwEFACXiZolUwmlU3ghHJd32havE=; b=BXTr4lXmvj4nu5SAHTj4u/eyZ4CfNW0ZPaGLExCpCbpp7xQ5Au5BnTRTME4GTUcw2kH9 w6ZCXJk4WH5zfGIiB7uuSZ72vNaKTf+pBQnkYgJOOjahgnfW80lXaE+51bBVnixUhcDR EKrMBd9DMhZxZU8fbM/hAuq5cU8ilMeVoqL3GUz+MiEVJCPOpvWQ7rbe/BoohceL5d7K ZNjmaHlUVf1ZRjMwUeEBw1aSDsGUQ6tF7SAUOTLGUn8/s/SbXMm+73AR/u3RapzBaO99 LFXAp2zGw80rZ6kFywD+TGTpiBWFJI70H1td2x5/38EB5CefdJfRHj1HxPuC7UN+IeYx 1g== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 04/23] vfio-user: add region cache Date: Wed, 1 Feb 2023 21:55:40 -0800 Message-Id: <9a44eb8b8d2737a2655059f796104e64a3cb1960.1675228037.git.john.g.johnson@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-ORIG-GUID: wQIOQyUaFkufUUgAhByOT0nUmZ1ixbbb X-Proofpoint-GUID: wQIOQyUaFkufUUgAhByOT0nUmZ1ixbbb Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.177.32; envelope-from=john.g.johnson@oracle.com; helo=mx0b-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675316966195100001 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" cache VFIO_DEVICE_GET_REGION_INFO results to reduce memory alloc/free cycles and as prep work for vfio-user Signed-off-by: John G Johnson Signed-off-by: Elena Ufimtseva Signed-off-by: Jagannathan Raman --- include/hw/vfio/vfio-common.h | 2 ++ hw/vfio/ccw.c | 5 ----- hw/vfio/common.c | 41 +++++++++++++++++++++++++++++++++++----= -- hw/vfio/igd.c | 23 +++++++++-------------- hw/vfio/migration.c | 2 -- hw/vfio/pci-quirks.c | 19 +++++-------------- hw/vfio/pci.c | 8 -------- 7 files changed, 51 insertions(+), 49 deletions(-) diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 953bc0f..7779cc7 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -145,6 +145,7 @@ typedef struct VFIODevice { VFIOMigration *migration; Error *migration_blocker; OnOffAuto pre_copy_dirty_page_tracking; + struct vfio_region_info **regions; } VFIODevice; =20 struct VFIODeviceOps { @@ -249,6 +250,7 @@ int vfio_get_region_info(VFIODevice *vbasedev, int inde= x, struct vfio_region_info **info); int vfio_get_dev_region_info(VFIODevice *vbasedev, uint32_t type, uint32_t subtype, struct vfio_region_info **i= nfo); +void vfio_get_all_regions(VFIODevice *vbasedev); bool vfio_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_ty= pe); struct vfio_info_cap_header * vfio_get_region_info_cap(struct vfio_region_info *info, uint16_t id); diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c index 0354737..06b588c 100644 --- a/hw/vfio/ccw.c +++ b/hw/vfio/ccw.c @@ -517,7 +517,6 @@ static void vfio_ccw_get_region(VFIOCCWDevice *vcdev, E= rror **errp) =20 vcdev->io_region_offset =3D info->offset; vcdev->io_region =3D g_malloc0(info->size); - g_free(info); =20 /* check for the optional async command region */ ret =3D vfio_get_dev_region_info(vdev, VFIO_REGION_TYPE_CCW, @@ -530,7 +529,6 @@ static void vfio_ccw_get_region(VFIOCCWDevice *vcdev, E= rror **errp) } vcdev->async_cmd_region_offset =3D info->offset; vcdev->async_cmd_region =3D g_malloc0(info->size); - g_free(info); } =20 ret =3D vfio_get_dev_region_info(vdev, VFIO_REGION_TYPE_CCW, @@ -543,7 +541,6 @@ static void vfio_ccw_get_region(VFIOCCWDevice *vcdev, E= rror **errp) } vcdev->schib_region_offset =3D info->offset; vcdev->schib_region =3D g_malloc(info->size); - g_free(info); } =20 ret =3D vfio_get_dev_region_info(vdev, VFIO_REGION_TYPE_CCW, @@ -557,7 +554,6 @@ static void vfio_ccw_get_region(VFIOCCWDevice *vcdev, E= rror **errp) } vcdev->crw_region_offset =3D info->offset; vcdev->crw_region =3D g_malloc(info->size); - g_free(info); } =20 return; @@ -567,7 +563,6 @@ out_err: g_free(vcdev->schib_region); g_free(vcdev->async_cmd_region); g_free(vcdev->io_region); - g_free(info); return; } =20 diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 9310a7f..f895b51 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -1602,8 +1602,6 @@ int vfio_region_setup(Object *obj, VFIODevice *vbased= ev, VFIORegion *region, } } =20 - g_free(info); - trace_vfio_region_setup(vbasedev->name, index, name, region->flags, region->fd_offset, region->size= ); return 0; @@ -2359,6 +2357,16 @@ void vfio_put_group(VFIOGroup *group) } } =20 +void vfio_get_all_regions(VFIODevice *vbasedev) +{ + struct vfio_region_info *info; + int i; + + for (i =3D 0; i < vbasedev->num_regions; i++) { + vfio_get_region_info(vbasedev, i, &info); + } +} + int vfio_get_device(VFIOGroup *group, const char *name, VFIODevice *vbasedev, Error **errp) { @@ -2414,12 +2422,23 @@ int vfio_get_device(VFIOGroup *group, const char *n= ame, trace_vfio_get_device(name, dev_info.flags, dev_info.num_regions, dev_info.num_irqs); =20 + vfio_get_all_regions(vbasedev); vbasedev->reset_works =3D !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET); return 0; } =20 void vfio_put_base_device(VFIODevice *vbasedev) { + if (vbasedev->regions !=3D NULL) { + int i; + + for (i =3D 0; i < vbasedev->num_regions; i++) { + g_free(vbasedev->regions[i]); + } + g_free(vbasedev->regions); + vbasedev->regions =3D NULL; + } + if (!vbasedev->group) { return; } @@ -2434,6 +2453,17 @@ int vfio_get_region_info(VFIODevice *vbasedev, int i= ndex, { size_t argsz =3D sizeof(struct vfio_region_info); =20 + /* create region cache */ + if (vbasedev->regions =3D=3D NULL) { + vbasedev->regions =3D g_new0(struct vfio_region_info *, + vbasedev->num_regions); + } + /* check cache */ + if (vbasedev->regions[index] !=3D NULL) { + *info =3D vbasedev->regions[index]; + return 0; + } + *info =3D g_malloc0(argsz); =20 (*info)->index =3D index; @@ -2453,6 +2483,9 @@ retry: goto retry; } =20 + /* fill cache */ + vbasedev->regions[index] =3D *info; + return 0; } =20 @@ -2471,7 +2504,6 @@ int vfio_get_dev_region_info(VFIODevice *vbasedev, ui= nt32_t type, =20 hdr =3D vfio_get_region_info_cap(*info, VFIO_REGION_INFO_CAP_TYPE); if (!hdr) { - g_free(*info); continue; } =20 @@ -2483,8 +2515,6 @@ int vfio_get_dev_region_info(VFIODevice *vbasedev, ui= nt32_t type, if (cap_type->type =3D=3D type && cap_type->subtype =3D=3D subtype= ) { return 0; } - - g_free(*info); } =20 *info =3D NULL; @@ -2500,7 +2530,6 @@ bool vfio_has_region_cap(VFIODevice *vbasedev, int re= gion, uint16_t cap_type) if (vfio_get_region_info_cap(info, cap_type)) { ret =3D true; } - g_free(info); } =20 return ret; diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c index afe3fe7..22efa1a 100644 --- a/hw/vfio/igd.c +++ b/hw/vfio/igd.c @@ -425,7 +425,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int= nr) if ((ret || !rom->size) && !vdev->pdev.romfile) { error_report("IGD device %s has no ROM, legacy mode disabled", vdev->vbasedev.name); - goto out; + return; } =20 /* @@ -436,7 +436,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int= nr) error_report("IGD device %s hotplugged, ROM disabled, " "legacy mode disabled", vdev->vbasedev.name); vdev->rom_read_failed =3D true; - goto out; + return; } =20 /* @@ -449,7 +449,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int= nr) if (ret) { error_report("IGD device %s does not support OpRegion access," "legacy mode disabled", vdev->vbasedev.name); - goto out; + return; } =20 ret =3D vfio_get_dev_region_info(&vdev->vbasedev, @@ -458,7 +458,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int= nr) if (ret) { error_report("IGD device %s does not support host bridge access," "legacy mode disabled", vdev->vbasedev.name); - goto out; + return; } =20 ret =3D vfio_get_dev_region_info(&vdev->vbasedev, @@ -467,7 +467,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int= nr) if (ret) { error_report("IGD device %s does not support LPC bridge access," "legacy mode disabled", vdev->vbasedev.name); - goto out; + return; } =20 gmch =3D vfio_pci_read_config(&vdev->pdev, IGD_GMCH, 4); @@ -481,7 +481,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int= nr) error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name); error_report("IGD device %s failed to enable VGA access, " "legacy mode disabled", vdev->vbasedev.name); - goto out; + return; } =20 /* Create our LPC/ISA bridge */ @@ -489,7 +489,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int= nr) if (ret) { error_report("IGD device %s failed to create LPC bridge, " "legacy mode disabled", vdev->vbasedev.name); - goto out; + return; } =20 /* Stuff some host values into the VM PCI host bridge */ @@ -497,7 +497,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int= nr) if (ret) { error_report("IGD device %s failed to modify host bridge, " "legacy mode disabled", vdev->vbasedev.name); - goto out; + return; } =20 /* Setup OpRegion access */ @@ -505,7 +505,7 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int= nr) if (ret) { error_append_hint(&err, "IGD legacy mode disabled\n"); error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name); - goto out; + return; } =20 /* Setup our quirk to munge GTT addresses to the VM allocated buffer */ @@ -608,9 +608,4 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int= nr) =20 trace_vfio_pci_igd_bdsm_enabled(vdev->vbasedev.name, ggms_mb + gms_mb); =20 -out: - g_free(rom); - g_free(opregion); - g_free(host); - g_free(lpc); } diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index a6ad1f8..397be43 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -879,13 +879,11 @@ int vfio_migration_probe(VFIODevice *vbasedev, Error = **errp) } =20 trace_vfio_migration_probe(vbasedev->name, info->index); - g_free(info); return 0; =20 add_blocker: error_setg(&vbasedev->migration_blocker, "VFIO device doesn't support migration"); - g_free(info); =20 ret =3D migrate_add_blocker(vbasedev->migration_blocker, errp); if (ret < 0) { diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c index f0147a0..c04ee19 100644 --- a/hw/vfio/pci-quirks.c +++ b/hw/vfio/pci-quirks.c @@ -1585,16 +1585,14 @@ int vfio_pci_nvidia_v100_ram_init(VFIOPCIDevice *vd= ev, Error **errp) =20 hdr =3D vfio_get_region_info_cap(nv2reg, VFIO_REGION_INFO_CAP_NVLINK2_= SSATGT); if (!hdr) { - ret =3D -ENODEV; - goto free_exit; + return -ENODEV; } cap =3D (void *) hdr; =20 p =3D mmap(NULL, nv2reg->size, PROT_READ | PROT_WRITE, MAP_SHARED, vdev->vbasedev.fd, nv2reg->offset); if (p =3D=3D MAP_FAILED) { - ret =3D -errno; - goto free_exit; + return -errno; } =20 quirk =3D vfio_quirk_alloc(1); @@ -1607,8 +1605,6 @@ int vfio_pci_nvidia_v100_ram_init(VFIOPCIDevice *vdev= , Error **errp) OBJ_PROP_FLAG_READ); trace_vfio_pci_nvidia_gpu_setup_quirk(vdev->vbasedev.name, cap->tgt, nv2reg->size); -free_exit: - g_free(nv2reg); =20 return ret; } @@ -1635,16 +1631,14 @@ int vfio_pci_nvlink2_init(VFIOPCIDevice *vdev, Erro= r **errp) hdr =3D vfio_get_region_info_cap(atsdreg, VFIO_REGION_INFO_CAP_NVLINK2_SSATGT); if (!hdr) { - ret =3D -ENODEV; - goto free_exit; + return -ENODEV; } captgt =3D (void *) hdr; =20 hdr =3D vfio_get_region_info_cap(atsdreg, VFIO_REGION_INFO_CAP_NVLINK2_LNKSPD); if (!hdr) { - ret =3D -ENODEV; - goto free_exit; + return -ENODEV; } capspeed =3D (void *) hdr; =20 @@ -1653,8 +1647,7 @@ int vfio_pci_nvlink2_init(VFIOPCIDevice *vdev, Error = **errp) p =3D mmap(NULL, atsdreg->size, PROT_READ | PROT_WRITE, MAP_SHARED, vdev->vbasedev.fd, atsdreg->offset); if (p =3D=3D MAP_FAILED) { - ret =3D -errno; - goto free_exit; + return -errno; } =20 quirk =3D vfio_quirk_alloc(1); @@ -1674,8 +1667,6 @@ int vfio_pci_nvlink2_init(VFIOPCIDevice *vdev, Error = **errp) OBJ_PROP_FLAG_READ); trace_vfio_pci_nvlink2_setup_quirk_lnkspd(vdev->vbasedev.name, capspeed->link_speed); -free_exit: - g_free(atsdreg); =20 return ret; } diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 9d70114..b214a93 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -836,8 +836,6 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev) vdev->rom_size =3D size =3D reg_info->size; vdev->rom_offset =3D reg_info->offset; =20 - g_free(reg_info); - if (!vdev->rom_size) { vdev->rom_read_failed =3D true; error_report("vfio-pci: Cannot read device rom at " @@ -2564,7 +2562,6 @@ int vfio_populate_vga(VFIOPCIDevice *vdev, Error **er= rp) error_setg(errp, "unexpected VGA info, flags 0x%lx, size 0x%lx", (unsigned long)reg_info->flags, (unsigned long)reg_info->size); - g_free(reg_info); return -EINVAL; } =20 @@ -2573,8 +2570,6 @@ int vfio_populate_vga(VFIOPCIDevice *vdev, Error **er= rp) vdev->vga->fd_offset =3D reg_info->offset; vdev->vga->fd =3D vdev->vbasedev.fd; =20 - g_free(reg_info); - vdev->vga->region[QEMU_PCI_VGA_MEM].offset =3D QEMU_PCI_VGA_MEM_BASE; vdev->vga->region[QEMU_PCI_VGA_MEM].nr =3D QEMU_PCI_VGA_MEM; QLIST_INIT(&vdev->vga->region[QEMU_PCI_VGA_MEM].quirks); @@ -2669,8 +2664,6 @@ static void vfio_populate_device(VFIOPCIDevice *vdev,= Error **errp) } vdev->config_offset =3D reg_info->offset; =20 - g_free(reg_info); - if (vdev->features & VFIO_FEATURE_ENABLE_VGA) { ret =3D vfio_populate_vga(vdev, errp); if (ret) { @@ -3079,7 +3072,6 @@ static void vfio_realize(PCIDevice *pdev, Error **err= p) } =20 ret =3D vfio_pci_igd_opregion_init(vdev, opregion, errp); - g_free(opregion); if (ret) { goto out_teardown; } --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675316959; cv=none; d=zohomail.com; s=zohoarc; b=kO0qmfTeTqkMG6dyC92A57xko4wYDD/8gxc/+tE5be1Wa5QpB5CWnTxnc8hhOzCa4ytvfFgWgEszCcr83PTDv/Hk2Cgj4P7iX7jb4EckJHvMI85JyQHS2cnfS2fnOZ6wbKLbOmscywtVN36aPyI7GMCoILfWXyqV4fTaIxeQkvs= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675316959; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=FuJnF2nuDCYKxxkivwiPAVRu9Zh3w0ih6f2xXe3nabo=; b=kdoowynB7el1FRfAZhUOgn3QBippIZhInWSqmbqYIqwi/Jvo1DzDL9+P+flLvk79bLXXiexeyMSVEse0QYoR2qQrUFgqPg+VdxP/3BUT0PerA5btCbtR7P2C64BKzTtGK0bAHYwTQalTpcIYv4Izc9INimBQSl+fhPEbhFS9pzA= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1675316959001191.09440176627857; Wed, 1 Feb 2023 21:49:19 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSPD-000882-Qt; Thu, 02 Feb 2023 00:45:07 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPB-00087J-HU for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:05 -0500 Received: from mx0a-00069f02.pphosted.com ([205.220.165.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSP8-0006y1-Py for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:05 -0500 Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124ipUu027511; Thu, 2 Feb 2023 05:45:00 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfpywj0vu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:45:00 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254PpY013015; Thu, 2 Feb 2023 05:44:58 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-6 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:44:58 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=FuJnF2nuDCYKxxkivwiPAVRu9Zh3w0ih6f2xXe3nabo=; b=cilcGlj8lu/6nT+U5vpne2xGWgF02iN70lDtF1N+VoRW1ZE3tmw2uv4d8kw9AEL8HueB /JhhAPlhCuIYHzKVGeAP+Cnf/9isYzFoUgSwbk733C57LHvobNK6vr4gYAGIrGnitnOk 9Q+ZeC8a4zb3Ej01Zk053y6f8Q1GO7RC2MEn8ZH5OYB1shJBFMXwJA8pRxvh8nhpe3vX 7LyTKNyS/0cSHN35OFqgZ1NoH34s+LnOw18bx316TY38BdPurG7w8bmUXGEA3XNSkYAN 8i/fXaoRhGRz6Tf1TG4nWKRjwJiepgGCZ7osVVStlDUknW7kt2l5Gz1unE8jHcExrgmp Cw== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 05/23] vfio-user: add device IO ops vector Date: Wed, 1 Feb 2023 21:55:41 -0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-GUID: 75XSaaLHB_o16j6Bs79hRYFlwC6FOGdY X-Proofpoint-ORIG-GUID: 75XSaaLHB_o16j6Bs79hRYFlwC6FOGdY Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.165.32; envelope-from=john.g.johnson@oracle.com; helo=mx0a-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675316960238100003 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Used for communication with VFIO driver (prep work for vfio-user, which will communicate over a socket) Signed-off-by: John G Johnson --- include/hw/vfio/vfio-common.h | 14 +++++ hw/vfio/ap.c | 1 + hw/vfio/ccw.c | 1 + hw/vfio/common.c | 85 +++++++++++++++++++++---- hw/vfio/pci.c | 140 ++++++++++++++++++++++++++------------= ---- hw/vfio/platform.c | 1 + 6 files changed, 178 insertions(+), 64 deletions(-) diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 7779cc7..c2ff9ea 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -124,6 +124,7 @@ typedef struct VFIOHostDMAWindow { } VFIOHostDMAWindow; =20 typedef struct VFIODeviceOps VFIODeviceOps; +typedef struct VFIODeviceIO VFIODeviceIO; =20 typedef struct VFIODevice { QLIST_ENTRY(VFIODevice) next; @@ -139,6 +140,7 @@ typedef struct VFIODevice { bool ram_block_discard_allowed; bool enable_migration; VFIODeviceOps *ops; + VFIODeviceIO *io; unsigned int num_irqs; unsigned int num_regions; unsigned int flags; @@ -165,6 +167,16 @@ struct VFIODeviceOps { * through ioctl() to the kernel VFIO driver, but vfio-user * can use a socket to a remote process. */ +struct VFIODeviceIO { + int (*get_region_info)(VFIODevice *vdev, + struct vfio_region_info *info); + int (*get_irq_info)(VFIODevice *vdev, struct vfio_irq_info *irq); + int (*set_irqs)(VFIODevice *vdev, struct vfio_irq_set *irqs); + int (*region_read)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t s= ize, + void *data); + int (*region_write)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t = size, + void *data); +}; =20 struct VFIOContainerIO { int (*dma_map)(VFIOContainer *container, @@ -177,6 +189,8 @@ struct VFIOContainerIO { struct vfio_iommu_type1_dirty_bitmap_get *range); }; =20 +extern VFIODeviceIO vfio_dev_io_ioctl; + #endif /* CONFIG_LINUX */ =20 typedef struct VFIOGroup { diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c index e0dd561..c6638d5 100644 --- a/hw/vfio/ap.c +++ b/hw/vfio/ap.c @@ -102,6 +102,7 @@ static void vfio_ap_realize(DeviceState *dev, Error **e= rrp) mdevid =3D basename(vapdev->vdev.sysfsdev); vapdev->vdev.name =3D g_strdup_printf("%s", mdevid); vapdev->vdev.dev =3D dev; + vapdev->vdev.io =3D &vfio_dev_io_ioctl; =20 /* * vfio-ap devices operate in a way compatible with discarding of diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c index 06b588c..e4d840d 100644 --- a/hw/vfio/ccw.c +++ b/hw/vfio/ccw.c @@ -614,6 +614,7 @@ static void vfio_ccw_get_device(VFIOGroup *group, VFIOC= CWDevice *vcdev, vcdev->vdev.type =3D VFIO_DEVICE_TYPE_CCW; vcdev->vdev.name =3D name; vcdev->vdev.dev =3D &vcdev->cdev.parent_obj.parent_obj; + vcdev->vdev.io =3D &vfio_dev_io_ioctl; =20 return; =20 diff --git a/hw/vfio/common.c b/hw/vfio/common.c index f895b51..45b950a 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -73,7 +73,7 @@ void vfio_disable_irqindex(VFIODevice *vbasedev, int inde= x) .count =3D 0, }; =20 - ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set); + vbasedev->io->set_irqs(vbasedev, &irq_set); } =20 void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index) @@ -86,7 +86,7 @@ void vfio_unmask_single_irqindex(VFIODevice *vbasedev, in= t index) .count =3D 1, }; =20 - ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set); + vbasedev->io->set_irqs(vbasedev, &irq_set); } =20 void vfio_mask_single_irqindex(VFIODevice *vbasedev, int index) @@ -99,7 +99,7 @@ void vfio_mask_single_irqindex(VFIODevice *vbasedev, int = index) .count =3D 1, }; =20 - ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set); + vbasedev->io->set_irqs(vbasedev, &irq_set); } =20 static inline const char *action_to_str(int action) @@ -180,9 +180,7 @@ int vfio_set_irq_signaling(VFIODevice *vbasedev, int in= dex, int subindex, pfd =3D (int32_t *)&irq_set->data; *pfd =3D fd; =20 - if (ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set)) { - ret =3D -errno; - } + ret =3D vbasedev->io->set_irqs(vbasedev, irq_set); g_free(irq_set); =20 if (!ret) { @@ -217,6 +215,7 @@ void vfio_region_write(void *opaque, hwaddr addr, uint32_t dword; uint64_t qword; } buf; + int ret; =20 switch (size) { case 1: @@ -236,13 +235,13 @@ void vfio_region_write(void *opaque, hwaddr addr, break; } =20 - if (pwrite(vbasedev->fd, &buf, size, region->fd_offset + addr) !=3D si= ze) { + ret =3D vbasedev->io->region_write(vbasedev, region->nr, addr, size, &= buf); + if (ret !=3D size) { error_report("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64 ",%d) failed: %m", __func__, vbasedev->name, region->nr, addr, data, size); } - trace_vfio_region_write(vbasedev->name, region->nr, addr, data, size); =20 /* @@ -268,13 +267,16 @@ uint64_t vfio_region_read(void *opaque, uint64_t qword; } buf; uint64_t data =3D 0; + int ret; =20 - if (pread(vbasedev->fd, &buf, size, region->fd_offset + addr) !=3D siz= e) { + ret =3D vbasedev->io->region_read(vbasedev, region->nr, addr, size, &b= uf); + if (ret !=3D size) { error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %m", __func__, vbasedev->name, region->nr, addr, size); return (uint64_t)-1; } + switch (size) { case 1: data =3D buf.byte; @@ -2452,6 +2454,7 @@ int vfio_get_region_info(VFIODevice *vbasedev, int in= dex, struct vfio_region_info **info) { size_t argsz =3D sizeof(struct vfio_region_info); + int ret; =20 /* create region cache */ if (vbasedev->regions =3D=3D NULL) { @@ -2470,10 +2473,11 @@ int vfio_get_region_info(VFIODevice *vbasedev, int = index, retry: (*info)->argsz =3D argsz; =20 - if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, *info)) { + ret =3D vbasedev->io->get_region_info(vbasedev, *info); + if (ret !=3D 0) { g_free(*info); *info =3D NULL; - return -errno; + return ret; } =20 if ((*info)->argsz > argsz) { @@ -2634,6 +2638,65 @@ int vfio_eeh_as_op(AddressSpace *as, uint32_t op) * Traditional ioctl() based io */ =20 +static int vfio_io_get_region_info(VFIODevice *vbasedev, + struct vfio_region_info *info) +{ + int ret; + + ret =3D ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, info); + + return ret < 0 ? -errno : ret; +} + +static int vfio_io_get_irq_info(VFIODevice *vbasedev, + struct vfio_irq_info *info) +{ + int ret; + + ret =3D ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, info); + + return ret < 0 ? -errno : ret; +} + +static int vfio_io_set_irqs(VFIODevice *vbasedev, struct vfio_irq_set *irq= s) +{ + int ret; + + ret =3D ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irqs); + + return ret < 0 ? -errno : ret; +} + +static int vfio_io_region_read(VFIODevice *vbasedev, uint8_t index, off_t = off, + uint32_t size, void *data) +{ + struct vfio_region_info *info =3D vbasedev->regions[index]; + int ret; + + ret =3D pread(vbasedev->fd, data, size, info->offset + off); + + return ret < 0 ? -errno : ret; +} + +static int vfio_io_region_write(VFIODevice *vbasedev, uint8_t index, off_t= off, + uint32_t size, void *data) +{ + struct vfio_region_info *info =3D vbasedev->regions[index]; + int ret; + + ret =3D pwrite(vbasedev->fd, data, size, info->offset + off); + + return ret < 0 ? -errno : ret; +} + +VFIODeviceIO vfio_dev_io_ioctl =3D { + .get_region_info =3D vfio_io_get_region_info, + .get_irq_info =3D vfio_io_get_irq_info, + .set_irqs =3D vfio_io_set_irqs, + .region_read =3D vfio_io_region_read, + .region_write =3D vfio_io_region_write, +}; + static int vfio_io_dma_map(VFIOContainer *container, struct vfio_iommu_type1_dma_map *map) { diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index b214a93..c3c2e76 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -43,6 +43,14 @@ #include "migration/blocker.h" #include "migration/qemu-file.h" =20 +/* convenience macros for PCI config space */ +#define VDEV_CONFIG_READ(vbasedev, off, size, data) \ + ((vbasedev)->io->region_read((vbasedev), VFIO_PCI_CONFIG_REGION_INDEX,= \ + (off), (size), (data))) +#define VDEV_CONFIG_WRITE(vbasedev, off, size, data) \ + ((vbasedev)->io->region_write((vbasedev), VFIO_PCI_CONFIG_REGION_INDEX= , \ + (off), (size), (data))) + #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug" =20 /* Protected by BQL */ @@ -406,7 +414,7 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, boo= l msix) fds[i] =3D fd; } =20 - ret =3D ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set); + ret =3D vdev->vbasedev.io->set_irqs(&vdev->vbasedev, irq_set); =20 g_free(irq_set); =20 @@ -818,14 +826,16 @@ static void vfio_update_msi(VFIOPCIDevice *vdev) =20 static void vfio_pci_load_rom(VFIOPCIDevice *vdev) { + VFIODevice *vbasedev =3D &vdev->vbasedev; struct vfio_region_info *reg_info; uint64_t size; off_t off =3D 0; ssize_t bytes; + int ret; =20 - if (vfio_get_region_info(&vdev->vbasedev, - VFIO_PCI_ROM_REGION_INDEX, ®_info)) { - error_report("vfio: Error getting ROM info: %m"); + ret =3D vfio_get_region_info(vbasedev, VFIO_PCI_ROM_REGION_INDEX, ®= _info); + if (ret < 0) { + error_report("vfio: Error getting ROM info: %s", strerror(-ret)); return; } =20 @@ -850,18 +860,19 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev) memset(vdev->rom, 0xff, size); =20 while (size) { - bytes =3D pread(vdev->vbasedev.fd, vdev->rom + off, - size, vdev->rom_offset + off); + bytes =3D vbasedev->io->region_read(vbasedev, VFIO_PCI_ROM_REGION_= INDEX, + off, size, vdev->rom + off); if (bytes =3D=3D 0) { break; } else if (bytes > 0) { off +=3D bytes; size -=3D bytes; } else { - if (errno =3D=3D EINTR || errno =3D=3D EAGAIN) { + if (bytes =3D=3D -EINTR || bytes =3D=3D -EAGAIN) { continue; } - error_report("vfio: Error reading device ROM: %m"); + error_report("vfio: Error reading device ROM: %s", + strerror(-bytes)); break; } } @@ -949,11 +960,10 @@ static const MemoryRegionOps vfio_rom_ops =3D { =20 static void vfio_pci_size_rom(VFIOPCIDevice *vdev) { + VFIODevice *vbasedev =3D &vdev->vbasedev; uint32_t orig, size =3D cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK); - off_t offset =3D vdev->config_offset + PCI_ROM_ADDRESS; DeviceState *dev =3D DEVICE(vdev); char *name; - int fd =3D vdev->vbasedev.fd; =20 if (vdev->pdev.romfile || !vdev->pdev.rom_bar) { /* Since pci handles romfile, just print a message and return */ @@ -970,11 +980,12 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev) * Use the same size ROM BAR as the physical device. The contents * will get filled in later when the guest tries to read it. */ - if (pread(fd, &orig, 4, offset) !=3D 4 || - pwrite(fd, &size, 4, offset) !=3D 4 || - pread(fd, &size, 4, offset) !=3D 4 || - pwrite(fd, &orig, 4, offset) !=3D 4) { - error_report("%s(%s) failed: %m", __func__, vdev->vbasedev.name); + if (VDEV_CONFIG_READ(vbasedev, PCI_ROM_ADDRESS, 4, &orig) !=3D 4 || + VDEV_CONFIG_WRITE(vbasedev, PCI_ROM_ADDRESS, 4, &size) !=3D 4 || + VDEV_CONFIG_READ(vbasedev, PCI_ROM_ADDRESS, 4, &size) !=3D 4 || + VDEV_CONFIG_WRITE(vbasedev, PCI_ROM_ADDRESS, 4, &orig) !=3D 4) { + + error_report("%s(%s) ROM access failed", __func__, vbasedev->name); return; } =20 @@ -1154,6 +1165,7 @@ static void vfio_sub_page_bar_update_mapping(PCIDevic= e *pdev, int bar) uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len) { VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(pdev); + VFIODevice *vbasedev =3D &vdev->vbasedev; uint32_t emu_bits =3D 0, emu_val =3D 0, phys_val =3D 0, val; =20 memcpy(&emu_bits, vdev->emulated_config_bits + addr, len); @@ -1166,12 +1178,13 @@ uint32_t vfio_pci_read_config(PCIDevice *pdev, uint= 32_t addr, int len) if (~emu_bits & (0xffffffffU >> (32 - len * 8))) { ssize_t ret; =20 - ret =3D pread(vdev->vbasedev.fd, &phys_val, len, - vdev->config_offset + addr); + ret =3D VDEV_CONFIG_READ(vbasedev, addr, len, &phys_val); if (ret !=3D len) { - error_report("%s(%s, 0x%x, 0x%x) failed: %m", - __func__, vdev->vbasedev.name, addr, len); - return -errno; + const char *err =3D ret < 0 ? strerror(-ret) : "short read"; + + error_report("%s(%s, 0x%x, 0x%x) failed: %s", + __func__, vbasedev->name, addr, len, err); + return -1; } phys_val =3D le32_to_cpu(phys_val); } @@ -1187,15 +1200,19 @@ void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr, uint32_t val, int len) { VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(pdev); + VFIODevice *vbasedev =3D &vdev->vbasedev; uint32_t val_le =3D cpu_to_le32(val); + int ret; =20 trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len); =20 /* Write everything to VFIO, let it filter out what we can't write */ - if (pwrite(vdev->vbasedev.fd, &val_le, len, vdev->config_offset + addr) - !=3D len) { - error_report("%s(%s, 0x%x, 0x%x, 0x%x) failed: %m", - __func__, vdev->vbasedev.name, addr, val, len); + ret =3D VDEV_CONFIG_WRITE(vbasedev, addr, len, &val_le); + if (ret !=3D len) { + const char *err =3D ret < 0 ? strerror(-ret) : "short write"; + + error_report("%s(%s, 0x%x, 0x%x, 0x%x) failed: %s", + __func__, vbasedev->name, addr, val, len, err); } =20 /* MSI/MSI-X Enabling/Disabling */ @@ -1283,10 +1300,13 @@ static int vfio_msi_setup(VFIOPCIDevice *vdev, int = pos, Error **errp) int ret, entries; Error *err =3D NULL; =20 - if (pread(vdev->vbasedev.fd, &ctrl, sizeof(ctrl), - vdev->config_offset + pos + PCI_CAP_FLAGS) !=3D sizeof(ctrl)= ) { - error_setg_errno(errp, errno, "failed reading MSI PCI_CAP_FLAGS"); - return -errno; + ret =3D VDEV_CONFIG_READ(&vdev->vbasedev, pos + PCI_CAP_FLAGS, + sizeof(ctrl), &ctrl); + if (ret !=3D sizeof(ctrl)) { + const char *err =3D ret < 0 ? strerror(-ret) : "short read"; + + error_setg(errp, "failed reading MSI PCI_CAP_FLAGS %s", err); + return ret; } ctrl =3D le16_to_cpu(ctrl); =20 @@ -1488,33 +1508,39 @@ static void vfio_pci_relocate_msix(VFIOPCIDevice *v= dev, Error **errp) */ static void vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp) { + VFIODevice *vbasedev =3D &vdev->vbasedev; uint8_t pos; uint16_t ctrl; uint32_t table, pba; - int fd =3D vdev->vbasedev.fd; VFIOMSIXInfo *msix; + int ret; =20 pos =3D pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX); if (!pos) { return; } =20 - if (pread(fd, &ctrl, sizeof(ctrl), - vdev->config_offset + pos + PCI_MSIX_FLAGS) !=3D sizeof(ctrl= )) { - error_setg_errno(errp, errno, "failed to read PCI MSIX FLAGS"); - return; + ret =3D VDEV_CONFIG_READ(vbasedev, pos + PCI_MSIX_FLAGS, + sizeof(ctrl), &ctrl); + if (ret !=3D sizeof(ctrl)) { + const char *err =3D ret < 0 ? strerror(-ret) : "short read"; + + error_setg(errp, "failed to read PCI MSIX FLAGS %s", err); } =20 - if (pread(fd, &table, sizeof(table), - vdev->config_offset + pos + PCI_MSIX_TABLE) !=3D sizeof(tabl= e)) { - error_setg_errno(errp, errno, "failed to read PCI MSIX TABLE"); - return; + ret =3D VDEV_CONFIG_READ(vbasedev, pos + PCI_MSIX_TABLE, + sizeof(table), &table); + if (ret !=3D sizeof(table)) { + const char *err =3D ret < 0 ? strerror(-ret) : "short read"; + + error_setg(errp, "failed to read PCI MSIX TABLE %s", err); } =20 - if (pread(fd, &pba, sizeof(pba), - vdev->config_offset + pos + PCI_MSIX_PBA) !=3D sizeof(pba)) { - error_setg_errno(errp, errno, "failed to read PCI MSIX PBA"); - return; + ret =3D VDEV_CONFIG_READ(vbasedev, pos + PCI_MSIX_PBA, sizeof(pba), &p= ba); + if (ret !=3D sizeof(pba)) { + const char *err =3D ret < 0 ? strerror(-ret) : "short read"; + + error_setg(errp, "failed to read PCI MSIX PBA %s", err); } =20 ctrl =3D le16_to_cpu(ctrl); @@ -1652,7 +1678,6 @@ static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev= , bool enabled) static void vfio_bar_prepare(VFIOPCIDevice *vdev, int nr) { VFIOBAR *bar =3D &vdev->bars[nr]; - uint32_t pci_bar; int ret; =20 @@ -1662,10 +1687,12 @@ static void vfio_bar_prepare(VFIOPCIDevice *vdev, i= nt nr) } =20 /* Determine what type of BAR this is for registration */ - ret =3D pread(vdev->vbasedev.fd, &pci_bar, sizeof(pci_bar), - vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr)); + ret =3D VDEV_CONFIG_READ(&vdev->vbasedev, PCI_BASE_ADDRESS_0 + (4 * nr= ), + sizeof(pci_bar), &pci_bar); if (ret !=3D sizeof(pci_bar)) { - error_report("vfio: Failed to read BAR %d (%m)", nr); + const char *err =3D ret < 0 ? strerror(-ret) : "short read"; + + error_report("vfio: Failed to read BAR %d (%s)", nr, err); return; } =20 @@ -2213,8 +2240,9 @@ static void vfio_pci_pre_reset(VFIOPCIDevice *vdev) =20 static void vfio_pci_post_reset(VFIOPCIDevice *vdev) { + VFIODevice *vbasedev =3D &vdev->vbasedev; Error *err =3D NULL; - int nr; + int ret, nr; =20 vfio_intx_enable(vdev, &err); if (err) { @@ -2222,13 +2250,16 @@ static void vfio_pci_post_reset(VFIOPCIDevice *vdev) } =20 for (nr =3D 0; nr < PCI_NUM_REGIONS - 1; ++nr) { - off_t addr =3D vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr); + off_t addr =3D PCI_BASE_ADDRESS_0 + (4 * nr); uint32_t val =3D 0; uint32_t len =3D sizeof(val); =20 - if (pwrite(vdev->vbasedev.fd, &val, len, addr) !=3D len) { - error_report("%s(%s) reset bar %d failed: %m", __func__, - vdev->vbasedev.name, nr); + ret =3D VDEV_CONFIG_WRITE(vbasedev, addr, len, &val); + if (ret !=3D len) { + const char *err =3D ret < 0 ? strerror(-ret) : "short write"; + + error_report("%s(%s) reset bar %d failed: %s", __func__, + vbasedev->name, nr, err); } } =20 @@ -2675,7 +2706,7 @@ static void vfio_populate_device(VFIOPCIDevice *vdev,= Error **errp) =20 irq_info.index =3D VFIO_PCI_ERR_IRQ_INDEX; =20 - ret =3D ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info); + ret =3D vbasedev->io->get_irq_info(vbasedev, &irq_info); if (ret) { /* This can fail for an old kernel or legacy PCI dev */ trace_vfio_populate_device_get_irq_info_failure(strerror(errno)); @@ -2794,8 +2825,10 @@ static void vfio_register_req_notifier(VFIOPCIDevice= *vdev) return; } =20 - if (ioctl(vdev->vbasedev.fd, - VFIO_DEVICE_GET_IRQ_INFO, &irq_info) < 0 || irq_info.count <= 1) { + if (vdev->vbasedev.io->get_irq_info(&vdev->vbasedev, &irq_info) < 0) { + return; + } + if (irq_info.count < 1) { return; } =20 @@ -2874,6 +2907,7 @@ static void vfio_realize(PCIDevice *pdev, Error **err= p) vbasedev->ops =3D &vfio_pci_ops; vbasedev->type =3D VFIO_DEVICE_TYPE_PCI; vbasedev->dev =3D DEVICE(vdev); + vbasedev->io =3D &vfio_dev_io_ioctl; =20 tmp =3D g_strdup_printf("%s/iommu_group", vbasedev->sysfsdev); len =3D readlink(tmp, group_path, sizeof(group_path)); diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c index 5af73f9..222405e 100644 --- a/hw/vfio/platform.c +++ b/hw/vfio/platform.c @@ -621,6 +621,7 @@ static void vfio_platform_realize(DeviceState *dev, Err= or **errp) vbasedev->type =3D VFIO_DEVICE_TYPE_PLATFORM; vbasedev->dev =3D dev; vbasedev->ops =3D &vfio_platform_ops; + vbasedev->io =3D &vfio_dev_io_ioctl; =20 qemu_mutex_init(&vdev->intp_mutex); =20 --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675316966; cv=none; d=zohomail.com; s=zohoarc; b=nHylb36GcUx/pVtsb/9eun8WLlhoW2kTUeNbcmB5WOQ0Ml9/+B26/kHBZT3CQyZZI2JrJw33xLc6sQNLdxMoEUbKgUbK3avdDxayr86E0lYUSqr5bNgh6Y3r3hRo1ftlrvZ7IWns63MqMVh0lJZlGX6kdMkFFgAmyhDSUKrJpjA= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675316966; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=TNWwnIlZeZSXe1fgH8hwnwdIngJ5K1drlS8NEG5K90E=; b=bxHHI55NERfOcqTSlQpj9G/WzFra46C16SCJh4G5xkuxvj1PRleVD3iuqoNGpJPkn16hV6Ghx538GmRKJ5T0y4z1HMEhLnlJOMrkUzw9E+USoXCDJ8I/tNNv0zyHZj73K+DGnawSlSUCUql/VjVVgnqYHw2I1+eDFZFLTyoiW9A= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 167531696691961.57049562098905; Wed, 1 Feb 2023 21:49:26 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSPC-00087f-Fl; Thu, 02 Feb 2023 00:45:06 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPB-00087O-N1 for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:05 -0500 Received: from mx0a-00069f02.pphosted.com ([205.220.165.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSP9-0006yM-K2 for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:05 -0500 Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124i0QR023941; Thu, 2 Feb 2023 05:45:01 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfkd1tfd0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:45:01 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254PpZ013015; Thu, 2 Feb 2023 05:44:59 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-7 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:44:59 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2022-7-12; bh=TNWwnIlZeZSXe1fgH8hwnwdIngJ5K1drlS8NEG5K90E=; b=tVcE2iNrkyLOEfXGZQvmDM72pm1km8T1i6wW0yClqBxuP+2FhdMNSYegVfiblVX+gJSz pgVQnK3HFY3UQODM+87bjVUB65AgLjknVVcgr4D+h6TI2LWeNrvNQCQ4MyzKnko+Bgf+ d9tceNlltVF9QxdsHH0bSVf8KKXgzbbpeXabZ2HcU7q7yIT7pRkoDNo7+0UYfI+VhzUU Vb7PGvcbw0oq/mtMnYvBPC/S/5Y9OxoxoFt2s7oVC5XBeTIrYuN/diXRRhBbzcraEbh0 ZJtinbf5ZPSZWR36/izZlZxpt+N3cRVErakmGQNHd1PBtkS/kJKgAfgmVVN36WrW1njt bw== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 06/23] vfio-user: Define type vfio_user_pci_dev_info Date: Wed, 1 Feb 2023 21:55:42 -0800 Message-Id: <08e29735fca137ca972234727aebd73638fedb41.1675228037.git.john.g.johnson@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-GUID: N9CHbK7IYNVPyUeffmZ2VI-VP5gXBYLR X-Proofpoint-ORIG-GUID: N9CHbK7IYNVPyUeffmZ2VI-VP5gXBYLR Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.165.32; envelope-from=john.g.johnson@oracle.com; helo=mx0a-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675316968261100001 New class for vfio-user with its class and instance constructors and destructors, and its pci ops. Signed-off-by: Elena Ufimtseva Signed-off-by: John G Johnson Signed-off-by: Jagannathan Raman --- hw/vfio/pci.h | 7 +++ hw/vfio/pci.c | 12 +++--- hw/vfio/user-pci.c | 121 ++++++++++++++++++++++++++++++++++++++++++++++++= ++++ MAINTAINERS | 3 ++ hw/vfio/Kconfig | 10 +++++ hw/vfio/meson.build | 1 + 6 files changed, 148 insertions(+), 6 deletions(-) create mode 100644 hw/vfio/user-pci.c diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 7fb656c..50a1d07 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -208,6 +208,13 @@ uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_= t addr, int len); void vfio_pci_write_config(PCIDevice *pdev, uint32_t addr, uint32_t val, int len); =20 +void vfio_intx_eoi(VFIODevice *vbasedev); +Object *vfio_pci_get_object(VFIODevice *vbasedev); +void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f); +int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f); +void vfio_put_device(VFIOPCIDevice *vdev); +void vfio_instance_init(Object *obj); + uint64_t vfio_vga_read(void *opaque, hwaddr addr, unsigned size); void vfio_vga_write(void *opaque, hwaddr addr, uint64_t data, unsigned siz= e); =20 diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index c3c2e76..a8bc0ea 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -107,7 +107,7 @@ static void vfio_intx_interrupt(void *opaque) } } =20 -static void vfio_intx_eoi(VFIODevice *vbasedev) +void vfio_intx_eoi(VFIODevice *vbasedev) { VFIOPCIDevice *vdev =3D container_of(vbasedev, VFIOPCIDevice, vbasedev= ); =20 @@ -2492,7 +2492,7 @@ static void vfio_pci_compute_needs_reset(VFIODevice *= vbasedev) } } =20 -static Object *vfio_pci_get_object(VFIODevice *vbasedev) +Object *vfio_pci_get_object(VFIODevice *vbasedev) { VFIOPCIDevice *vdev =3D container_of(vbasedev, VFIOPCIDevice, vbasedev= ); =20 @@ -2517,14 +2517,14 @@ const VMStateDescription vmstate_vfio_pci_config = =3D { } }; =20 -static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f) +void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f) { VFIOPCIDevice *vdev =3D container_of(vbasedev, VFIOPCIDevice, vbasedev= ); =20 vmstate_save_state(f, &vmstate_vfio_pci_config, vdev, NULL); } =20 -static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f) +int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f) { VFIOPCIDevice *vdev =3D container_of(vbasedev, VFIOPCIDevice, vbasedev= ); PCIDevice *pdev =3D &vdev->pdev; @@ -2719,7 +2719,7 @@ static void vfio_populate_device(VFIOPCIDevice *vdev,= Error **errp) } } =20 -static void vfio_put_device(VFIOPCIDevice *vdev) +void vfio_put_device(VFIOPCIDevice *vdev) { g_free(vdev->vbasedev.name); g_free(vdev->msix); @@ -3271,7 +3271,7 @@ post_reset: vfio_pci_post_reset(vdev); } =20 -static void vfio_instance_init(Object *obj) +void vfio_instance_init(Object *obj) { PCIDevice *pci_dev =3D PCI_DEVICE(obj); VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(obj); diff --git a/hw/vfio/user-pci.c b/hw/vfio/user-pci.c new file mode 100644 index 0000000..fc47a3e --- /dev/null +++ b/hw/vfio/user-pci.c @@ -0,0 +1,121 @@ +/* + * vfio PCI device over a UNIX socket. + * + * Copyright =C2=A9 2018, 2021 Oracle and/or its affiliates. + * + * This work is licensed under the terms of the GNU GPL, version 2 or late= r. + * See the COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include +#include + +#include "hw/hw.h" +#include "hw/pci/msi.h" +#include "hw/pci/msix.h" +#include "hw/pci/pci_bridge.h" +#include "hw/qdev-properties.h" +#include "hw/qdev-properties-system.h" +#include "migration/vmstate.h" +#include "qapi/qmp/qdict.h" +#include "qemu/error-report.h" +#include "qemu/main-loop.h" +#include "qemu/module.h" +#include "qemu/range.h" +#include "qemu/units.h" +#include "sysemu/kvm.h" +#include "sysemu/runstate.h" +#include "pci.h" +#include "trace.h" +#include "qapi/error.h" +#include "migration/blocker.h" +#include "migration/qemu-file.h" + +#define TYPE_VFIO_USER_PCI "vfio-user-pci" +OBJECT_DECLARE_SIMPLE_TYPE(VFIOUserPCIDevice, VFIO_USER_PCI) + +struct VFIOUserPCIDevice { + VFIOPCIDevice device; + char *sock_name; +}; + +/* + * Emulated devices don't use host hot reset + */ +static void vfio_user_compute_needs_reset(VFIODevice *vbasedev) +{ + vbasedev->needs_reset =3D false; +} + +static VFIODeviceOps vfio_user_pci_ops =3D { + .vfio_compute_needs_reset =3D vfio_user_compute_needs_reset, + .vfio_eoi =3D vfio_intx_eoi, + .vfio_get_object =3D vfio_pci_get_object, + .vfio_save_config =3D vfio_pci_save_config, + .vfio_load_config =3D vfio_pci_load_config, +}; + +static void vfio_user_pci_realize(PCIDevice *pdev, Error **errp) +{ + ERRP_GUARD(); + VFIOUserPCIDevice *udev =3D VFIO_USER_PCI(pdev); + VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(pdev); + VFIODevice *vbasedev =3D &vdev->vbasedev; + + /* + * TODO: make option parser understand SocketAddress + * and use that instead of having scalar options + * for each socket type. + */ + if (!udev->sock_name) { + error_setg(errp, "No socket specified"); + error_append_hint(errp, "Use -device vfio-user-pci,socket=3D= \n"); + return; + } + + vbasedev->name =3D g_strdup_printf("VFIO user <%s>", udev->sock_name); + vbasedev->ops =3D &vfio_user_pci_ops; + vbasedev->type =3D VFIO_DEVICE_TYPE_PCI; + vbasedev->dev =3D DEVICE(vdev); + +} + +static void vfio_user_instance_finalize(Object *obj) +{ + VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(obj); + + vfio_put_device(vdev); +} + +static Property vfio_user_pci_dev_properties[] =3D { + DEFINE_PROP_STRING("socket", VFIOUserPCIDevice, sock_name), + DEFINE_PROP_END_OF_LIST(), +}; + +static void vfio_user_pci_dev_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc =3D DEVICE_CLASS(klass); + PCIDeviceClass *pdc =3D PCI_DEVICE_CLASS(klass); + + device_class_set_props(dc, vfio_user_pci_dev_properties); + dc->desc =3D "VFIO over socket PCI device assignment"; + pdc->realize =3D vfio_user_pci_realize; +} + +static const TypeInfo vfio_user_pci_dev_info =3D { + .name =3D TYPE_VFIO_USER_PCI, + .parent =3D TYPE_VFIO_PCI_BASE, + .instance_size =3D sizeof(VFIOUserPCIDevice), + .class_init =3D vfio_user_pci_dev_class_init, + .instance_init =3D vfio_instance_init, + .instance_finalize =3D vfio_user_instance_finalize, +}; + +static void register_vfio_user_dev_type(void) +{ + type_register_static(&vfio_user_pci_dev_info); +} + +type_init(register_vfio_user_dev_type) diff --git a/MAINTAINERS b/MAINTAINERS index 999340d..28fce0e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1987,8 +1987,11 @@ L: qemu-s390x@nongnu.org vfio-user M: John G Johnson M: Thanos Makatos +M: Elena Ufimtseva +M: Jagannathan Raman S: Supported F: docs/devel/vfio-user.rst +F: hw/vfio/user* =20 vhost M: Michael S. Tsirkin diff --git a/hw/vfio/Kconfig b/hw/vfio/Kconfig index 7cdba05..301894e 100644 --- a/hw/vfio/Kconfig +++ b/hw/vfio/Kconfig @@ -2,6 +2,10 @@ config VFIO bool depends on LINUX =20 +config VFIO_USER + bool + depends on VFIO + config VFIO_PCI bool default y @@ -9,6 +13,12 @@ config VFIO_PCI select EDID depends on LINUX && PCI =20 +config VFIO_USER_PCI + bool + default y + select VFIO_USER + depends on VFIO_PCI + config VFIO_CCW bool default y diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build index da9af29..731c3c6 100644 --- a/hw/vfio/meson.build +++ b/hw/vfio/meson.build @@ -9,6 +9,7 @@ vfio_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files( 'pci-quirks.c', 'pci.c', )) +vfio_ss.add(when: 'CONFIG_VFIO_USER_PCI', if_true: files('user-pci.c')) vfio_ss.add(when: 'CONFIG_VFIO_CCW', if_true: files('ccw.c')) vfio_ss.add(when: 'CONFIG_VFIO_PLATFORM', if_true: files('platform.c')) vfio_ss.add(when: 'CONFIG_VFIO_XGMAC', if_true: files('calxeda-xgmac.c')) --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675316769; cv=none; d=zohomail.com; s=zohoarc; b=Cr7LsQZZHIY+B1vi1fTf/1cLM7fVAXERZz63gNN7sHYypuYy9ad0T6T34Gwu8Xwp/XaCjxrdfyN4HHrQxuO8uQsVwc2HpMJK6GfoQCVI65O03JzF10P51/qojFjuaC+eF7qjsvl27xWi0xtlEagzlo5BnktGm5rdbJsS01RFqhE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675316769; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=T+s/XK1I7Wf5AbjZ24Go7XrPXyU1szXtfterYlAvPQk=; b=jwOasn+n8Pyqadv5/JPkulHLw4DIJ2baRjS3INfloEQrRJIuE4H5RCykdVnrWKiqCfwqpRK17Ca1medMjVIiVp7hanaNCoqRPBjx5NU0nUIyNcy9HKvWPXFZK49YW6JXRnp83M5+k7prwVUxcBIRRTvcl3kD6z5ByjkWvoPnJdY= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1675316768661192.70984523691288; Wed, 1 Feb 2023 21:46:08 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSPF-00088P-NY; Thu, 02 Feb 2023 00:45:09 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPD-00087m-Dq for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:07 -0500 Received: from mx0a-00069f02.pphosted.com ([205.220.165.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPA-00074o-HJ for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:07 -0500 Received: from pps.filterd (m0246629.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124iRSA031897; Thu, 2 Feb 2023 05:45:02 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfk64ag4f-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:45:02 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254Ppa013015; Thu, 2 Feb 2023 05:45:00 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-8 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:45:00 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2022-7-12; bh=T+s/XK1I7Wf5AbjZ24Go7XrPXyU1szXtfterYlAvPQk=; b=LbHg/9e9oYOmkSAN+atemnVvRKLmHT1n2ii+Zaw8xrPYrjOyVbboO85fKxc3jIsxhSiT 8Nw85cKwQCOVUfaLVBb8mPj9F0c0GveNlq09N60Xcbjvf59m2AE3t4Te37mGcVZLNcAN IMxXj5baWCEcCvi3PFPi8YBPuBX3x/MhR3G8R2x3wgUEc5yNff7cWlgWBluomLftKMUh Bxi94lfJqAZFz1kWSavZ1onVEf0FyJuMsx9+IXrq1JPMazkCQzXW0ZiWu8lZbBLJ6FgN za/yhmHrkOGvxkmXCQvKFiMlJaqDYVVasZCeocIOs9TVjCoAVTk9AB4mtMKwUza0fxGb Bw== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 07/23] vfio-user: connect vfio proxy to remote server Date: Wed, 1 Feb 2023 21:55:43 -0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-ORIG-GUID: XYmmKeiNNCbolJFnoU8-HMDDtiEXQJq2 X-Proofpoint-GUID: XYmmKeiNNCbolJFnoU8-HMDDtiEXQJq2 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.165.32; envelope-from=john.g.johnson@oracle.com; helo=mx0a-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675316771380100001 add user.c & user.h files for vfio-user code add proxy struct to handle comms with remote server Signed-off-by: John G Johnson Signed-off-by: Elena Ufimtseva Signed-off-by: Jagannathan Raman --- hw/vfio/user.h | 78 +++++++++++++++++++ include/hw/vfio/vfio-common.h | 2 + hw/vfio/user-pci.c | 19 +++++ hw/vfio/user.c | 170 ++++++++++++++++++++++++++++++++++++++= ++++ hw/vfio/meson.build | 1 + 5 files changed, 270 insertions(+) create mode 100644 hw/vfio/user.h create mode 100644 hw/vfio/user.c diff --git a/hw/vfio/user.h b/hw/vfio/user.h new file mode 100644 index 0000000..ac7d15d --- /dev/null +++ b/hw/vfio/user.h @@ -0,0 +1,78 @@ +#ifndef VFIO_USER_H +#define VFIO_USER_H + +/* + * vfio protocol over a UNIX socket. + * + * Copyright =C2=A9 2018, 2021 Oracle and/or its affiliates. + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +typedef struct { + int send_fds; + int recv_fds; + int *fds; +} VFIOUserFDs; + +enum msg_type { + VFIO_MSG_NONE, + VFIO_MSG_ASYNC, + VFIO_MSG_WAIT, + VFIO_MSG_NOWAIT, + VFIO_MSG_REQ, +}; + +typedef struct VFIOUserMsg { + QTAILQ_ENTRY(VFIOUserMsg) next; + VFIOUserFDs *fds; + uint32_t rsize; + uint32_t id; + QemuCond cv; + bool complete; + enum msg_type type; +} VFIOUserMsg; + + +enum proxy_state { + VFIO_PROXY_CONNECTED =3D 1, + VFIO_PROXY_ERROR =3D 2, + VFIO_PROXY_CLOSING =3D 3, + VFIO_PROXY_CLOSED =3D 4, +}; + +typedef QTAILQ_HEAD(VFIOUserMsgQ, VFIOUserMsg) VFIOUserMsgQ; + +typedef struct VFIOUserProxy { + QLIST_ENTRY(VFIOUserProxy) next; + char *sockname; + struct QIOChannel *ioc; + void (*request)(void *opaque, VFIOUserMsg *msg); + void *req_arg; + int flags; + QemuCond close_cv; + AioContext *ctx; + QEMUBH *req_bh; + + /* + * above only changed when BQL is held + * below are protected by per-proxy lock + */ + QemuMutex lock; + VFIOUserMsgQ free; + VFIOUserMsgQ pending; + VFIOUserMsgQ incoming; + VFIOUserMsgQ outgoing; + VFIOUserMsg *last_nowait; + enum proxy_state state; +} VFIOUserProxy; + +/* VFIOProxy flags */ +#define VFIO_PROXY_CLIENT 0x1 + +VFIOUserProxy *vfio_user_connect_dev(SocketAddress *addr, Error **errp); +void vfio_user_disconnect(VFIOUserProxy *proxy); + +#endif /* VFIO_USER_H */ diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index c2ff9ea..e1ee0ac 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -76,6 +76,7 @@ typedef struct VFIOAddressSpace { =20 struct VFIOGroup; typedef struct VFIOContainerIO VFIOContainerIO; +typedef struct VFIOUserProxy VFIOUserProxy; =20 typedef struct VFIOContainer { VFIOAddressSpace *space; @@ -147,6 +148,7 @@ typedef struct VFIODevice { VFIOMigration *migration; Error *migration_blocker; OnOffAuto pre_copy_dirty_page_tracking; + VFIOUserProxy *proxy; struct vfio_region_info **regions; } VFIODevice; =20 diff --git a/hw/vfio/user-pci.c b/hw/vfio/user-pci.c index fc47a3e..a3fc36d 100644 --- a/hw/vfio/user-pci.c +++ b/hw/vfio/user-pci.c @@ -32,6 +32,7 @@ #include "qapi/error.h" #include "migration/blocker.h" #include "migration/qemu-file.h" +#include "hw/vfio/user.h" =20 #define TYPE_VFIO_USER_PCI "vfio-user-pci" OBJECT_DECLARE_SIMPLE_TYPE(VFIOUserPCIDevice, VFIO_USER_PCI) @@ -63,6 +64,9 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Error = **errp) VFIOUserPCIDevice *udev =3D VFIO_USER_PCI(pdev); VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(pdev); VFIODevice *vbasedev =3D &vdev->vbasedev; + SocketAddress addr; + VFIOUserProxy *proxy; + Error *err =3D NULL; =20 /* * TODO: make option parser understand SocketAddress @@ -75,6 +79,16 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Error= **errp) return; } =20 + memset(&addr, 0, sizeof(addr)); + addr.type =3D SOCKET_ADDRESS_TYPE_UNIX; + addr.u.q_unix.path =3D udev->sock_name; + proxy =3D vfio_user_connect_dev(&addr, &err); + if (!proxy) { + error_propagate(errp, err); + return; + } + vbasedev->proxy =3D proxy; + vbasedev->name =3D g_strdup_printf("VFIO user <%s>", udev->sock_name); vbasedev->ops =3D &vfio_user_pci_ops; vbasedev->type =3D VFIO_DEVICE_TYPE_PCI; @@ -85,8 +99,13 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Error= **errp) static void vfio_user_instance_finalize(Object *obj) { VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(obj); + VFIODevice *vbasedev =3D &vdev->vbasedev; =20 vfio_put_device(vdev); + + if (vbasedev->proxy !=3D NULL) { + vfio_user_disconnect(vbasedev->proxy); + } } =20 static Property vfio_user_pci_dev_properties[] =3D { diff --git a/hw/vfio/user.c b/hw/vfio/user.c new file mode 100644 index 0000000..3d4b0cc --- /dev/null +++ b/hw/vfio/user.c @@ -0,0 +1,170 @@ +/* + * vfio protocol over a UNIX socket. + * + * Copyright =C2=A9 2018, 2021 Oracle and/or its affiliates. + * + * This work is licensed under the terms of the GNU GPL, version 2 or late= r. + * See the COPYING file in the top-level directory. + * + */ + +#include "qemu/osdep.h" +#include +#include + +#include "qemu/error-report.h" +#include "qapi/error.h" +#include "qemu/main-loop.h" +#include "hw/hw.h" +#include "hw/vfio/vfio-common.h" +#include "hw/vfio/vfio.h" +#include "qemu/sockets.h" +#include "io/channel.h" +#include "io/channel-socket.h" +#include "io/channel-util.h" +#include "sysemu/iothread.h" +#include "user.h" + +static IOThread *vfio_user_iothread; + +static void vfio_user_shutdown(VFIOUserProxy *proxy); + + +/* + * Functions called by main, CPU, or iothread threads + */ + +static void vfio_user_shutdown(VFIOUserProxy *proxy) +{ + qio_channel_shutdown(proxy->ioc, QIO_CHANNEL_SHUTDOWN_READ, NULL); + qio_channel_set_aio_fd_handler(proxy->ioc, proxy->ctx, NULL, NULL, NUL= L); +} + +/* + * Functions only called by iothread + */ + +static void vfio_user_cb(void *opaque) +{ + VFIOUserProxy *proxy =3D opaque; + + QEMU_LOCK_GUARD(&proxy->lock); + + proxy->state =3D VFIO_PROXY_CLOSED; + qemu_cond_signal(&proxy->close_cv); +} + + +/* + * Functions called by main or CPU threads + */ + +static QLIST_HEAD(, VFIOUserProxy) vfio_user_sockets =3D + QLIST_HEAD_INITIALIZER(vfio_user_sockets); + +VFIOUserProxy *vfio_user_connect_dev(SocketAddress *addr, Error **errp) +{ + VFIOUserProxy *proxy; + QIOChannelSocket *sioc; + QIOChannel *ioc; + char *sockname; + + if (addr->type !=3D SOCKET_ADDRESS_TYPE_UNIX) { + error_setg(errp, "vfio_user_connect - bad address family"); + return NULL; + } + sockname =3D addr->u.q_unix.path; + + sioc =3D qio_channel_socket_new(); + ioc =3D QIO_CHANNEL(sioc); + if (qio_channel_socket_connect_sync(sioc, addr, errp)) { + object_unref(OBJECT(ioc)); + return NULL; + } + qio_channel_set_blocking(ioc, false, NULL); + + proxy =3D g_malloc0(sizeof(VFIOUserProxy)); + proxy->sockname =3D g_strdup_printf("unix:%s", sockname); + proxy->ioc =3D ioc; + proxy->flags =3D VFIO_PROXY_CLIENT; + proxy->state =3D VFIO_PROXY_CONNECTED; + + qemu_mutex_init(&proxy->lock); + qemu_cond_init(&proxy->close_cv); + + if (vfio_user_iothread =3D=3D NULL) { + vfio_user_iothread =3D iothread_create("VFIO user", errp); + } + + proxy->ctx =3D iothread_get_aio_context(vfio_user_iothread); + + QTAILQ_INIT(&proxy->outgoing); + QTAILQ_INIT(&proxy->incoming); + QTAILQ_INIT(&proxy->free); + QTAILQ_INIT(&proxy->pending); + QLIST_INSERT_HEAD(&vfio_user_sockets, proxy, next); + + return proxy; +} + +void vfio_user_disconnect(VFIOUserProxy *proxy) +{ + VFIOUserMsg *r1, *r2; + + qemu_mutex_lock(&proxy->lock); + + /* our side is quitting */ + if (proxy->state =3D=3D VFIO_PROXY_CONNECTED) { + vfio_user_shutdown(proxy); + if (!QTAILQ_EMPTY(&proxy->pending)) { + error_printf("vfio_user_disconnect: outstanding requests\n"); + } + } + object_unref(OBJECT(proxy->ioc)); + proxy->ioc =3D NULL; + + proxy->state =3D VFIO_PROXY_CLOSING; + QTAILQ_FOREACH_SAFE(r1, &proxy->outgoing, next, r2) { + qemu_cond_destroy(&r1->cv); + QTAILQ_REMOVE(&proxy->pending, r1, next); + g_free(r1); + } + QTAILQ_FOREACH_SAFE(r1, &proxy->incoming, next, r2) { + qemu_cond_destroy(&r1->cv); + QTAILQ_REMOVE(&proxy->incoming, r1, next); + g_free(r1); + } + QTAILQ_FOREACH_SAFE(r1, &proxy->pending, next, r2) { + qemu_cond_destroy(&r1->cv); + QTAILQ_REMOVE(&proxy->pending, r1, next); + g_free(r1); + } + QTAILQ_FOREACH_SAFE(r1, &proxy->free, next, r2) { + qemu_cond_destroy(&r1->cv); + QTAILQ_REMOVE(&proxy->free, r1, next); + g_free(r1); + } + + /* + * Make sure the iothread isn't blocking anywhere + * with a ref to this proxy by waiting for a BH + * handler to run after the proxy fd handlers were + * deleted above. + */ + aio_bh_schedule_oneshot(proxy->ctx, vfio_user_cb, proxy); + qemu_cond_wait(&proxy->close_cv, &proxy->lock); + + /* we now hold the only ref to proxy */ + qemu_mutex_unlock(&proxy->lock); + qemu_cond_destroy(&proxy->close_cv); + qemu_mutex_destroy(&proxy->lock); + + QLIST_REMOVE(proxy, next); + if (QLIST_EMPTY(&vfio_user_sockets)) { + iothread_destroy(vfio_user_iothread); + vfio_user_iothread =3D NULL; + } + + g_free(proxy->sockname); + g_free(proxy); +} diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build index 731c3c6..f24a47d 100644 --- a/hw/vfio/meson.build +++ b/hw/vfio/meson.build @@ -9,6 +9,7 @@ vfio_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files( 'pci-quirks.c', 'pci.c', )) +vfio_ss.add(when: 'CONFIG_VFIO_USER', if_true: files('user.c')) vfio_ss.add(when: 'CONFIG_VFIO_USER_PCI', if_true: files('user-pci.c')) vfio_ss.add(when: 'CONFIG_VFIO_CCW', if_true: files('ccw.c')) vfio_ss.add(when: 'CONFIG_VFIO_PLATFORM', if_true: files('platform.c')) --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675316731; cv=none; d=zohomail.com; s=zohoarc; b=fl80FRKDJivJqghkRRaVAgQplh6ynXX4Hd3UHGqFDkiRN8sOaJPFsLqlnqhTQR37uK2kCJkp3DMDbbZoK55cmeaPEeSgTODO6KS7GsLdVVLOx4U71ClVu4MHkvSfyM8dKwmhNmBRjnxjMcd0n8eTJXlT9hCKlrBpxbMevhzYVf8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675316731; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=EcA31DUvvDZzOozUYowb2ZbNk/TwhWoyk/7bbtFNDfk=; b=PNjFjXgJXkaxbvXGE+TCgXevjJXZyKN9l3JAnAYYeSKitXPCDfKzRvUfizVHl2XqUYscIbZi3uImRyszpXmTYtNf0JLW4FCw6UFCZ33d9f9LOuXTwWYobrtsQRCvHU8XeXlF3awdFvKcRBDrGPUeWJZc4z+tk2niC5pokjf+FBY= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1675316731565585.5720003123924; Wed, 1 Feb 2023 21:45:31 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSPF-00088O-Nb; Thu, 02 Feb 2023 00:45:09 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPE-000885-3p for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:08 -0500 Received: from mx0a-00069f02.pphosted.com ([205.220.165.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPB-000797-G7 for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:07 -0500 Received: from pps.filterd (m0246629.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124iZTm032288; Thu, 2 Feb 2023 05:45:03 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfk64ag4g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:45:03 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254Ppb013015; Thu, 2 Feb 2023 05:45:01 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-9 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:45:01 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references : mime-version : content-type : content-transfer-encoding; s=corp-2022-7-12; bh=EcA31DUvvDZzOozUYowb2ZbNk/TwhWoyk/7bbtFNDfk=; b=KNwegxy/xY6ipDtfU2VKdqBrBKCVF1R85/E6VEU8RvjMQghPSaUK3KC7t+D8cFOwNxl2 AlyFOcOn8M9sxeHVpasicwJve7SsqjB4lafL/lnTKNJVncw7571JSK9dtPLfd9JAL6Yy kTsKHPR6sQLJ4CMF1a4PU7a859NDzx1jyXqbCFrUeYJDdMEeOXmnYoJx5Z7tyPKcFqmA IAu5s0+ED/xSHNbnlPK/TLkTVOAl6h2x00B76rrTKGDC5iE2oV/zdOEKmpvGNBxD2KdH tCrIhQzbWl983uLX/XYKW/e1k53GzXBQX9PcYLJqwrvDSKl3/uSbUlbXyUL+KDegoSHw pw== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 08/23] vfio-user: define socket receive functions Date: Wed, 1 Feb 2023 21:55:44 -0800 Message-Id: <7c3b77a4930a747915d6b38e90c7b31f1e028501.1675228037.git.john.g.johnson@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-ORIG-GUID: BsI9mJot6tGPBt8sPPjn-SC6oNT68eVo X-Proofpoint-GUID: BsI9mJot6tGPBt8sPPjn-SC6oNT68eVo Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.165.32; envelope-from=john.g.johnson@oracle.com; helo=mx0a-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675316733247100004 Add infrastructure needed to receive incoming messages Signed-off-by: John G Johnson Signed-off-by: Elena Ufimtseva Signed-off-by: Jagannathan Raman --- hw/vfio/user-protocol.h | 54 +++++++ hw/vfio/user.h | 8 + hw/vfio/user-pci.c | 11 ++ hw/vfio/user.c | 408 ++++++++++++++++++++++++++++++++++++++++++++= ++++ hw/vfio/trace-events | 5 + 5 files changed, 486 insertions(+) create mode 100644 hw/vfio/user-protocol.h diff --git a/hw/vfio/user-protocol.h b/hw/vfio/user-protocol.h new file mode 100644 index 0000000..d23877c --- /dev/null +++ b/hw/vfio/user-protocol.h @@ -0,0 +1,54 @@ +#ifndef VFIO_USER_PROTOCOL_H +#define VFIO_USER_PROTOCOL_H + +/* + * vfio protocol over a UNIX socket. + * + * Copyright =C2=A9 2018, 2021 Oracle and/or its affiliates. + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + * Each message has a standard header that describes the command + * being sent, which is almost always a VFIO ioctl(). + * + * The header may be followed by command-specific data, such as the + * region and offset info for read and write commands. + */ + +typedef struct { + uint16_t id; + uint16_t command; + uint32_t size; + uint32_t flags; + uint32_t error_reply; +} VFIOUserHdr; + +/* VFIOUserHdr commands */ +enum vfio_user_command { + VFIO_USER_VERSION =3D 1, + VFIO_USER_DMA_MAP =3D 2, + VFIO_USER_DMA_UNMAP =3D 3, + VFIO_USER_DEVICE_GET_INFO =3D 4, + VFIO_USER_DEVICE_GET_REGION_INFO =3D 5, + VFIO_USER_DEVICE_GET_REGION_IO_FDS =3D 6, + VFIO_USER_DEVICE_GET_IRQ_INFO =3D 7, + VFIO_USER_DEVICE_SET_IRQS =3D 8, + VFIO_USER_REGION_READ =3D 9, + VFIO_USER_REGION_WRITE =3D 10, + VFIO_USER_DMA_READ =3D 11, + VFIO_USER_DMA_WRITE =3D 12, + VFIO_USER_DEVICE_RESET =3D 13, + VFIO_USER_DIRTY_PAGES =3D 14, + VFIO_USER_MAX, +}; + +/* VFIOUserHdr flags */ +#define VFIO_USER_REQUEST 0x0 +#define VFIO_USER_REPLY 0x1 +#define VFIO_USER_TYPE 0xF + +#define VFIO_USER_NO_REPLY 0x10 +#define VFIO_USER_ERROR 0x20 + +#endif /* VFIO_USER_PROTOCOL_H */ diff --git a/hw/vfio/user.h b/hw/vfio/user.h index ac7d15d..5259a30 100644 --- a/hw/vfio/user.h +++ b/hw/vfio/user.h @@ -11,6 +11,8 @@ * */ =20 +#include "user-protocol.h" + typedef struct { int send_fds; int recv_fds; @@ -27,6 +29,7 @@ enum msg_type { =20 typedef struct VFIOUserMsg { QTAILQ_ENTRY(VFIOUserMsg) next; + VFIOUserHdr *hdr; VFIOUserFDs *fds; uint32_t rsize; uint32_t id; @@ -66,6 +69,8 @@ typedef struct VFIOUserProxy { VFIOUserMsgQ incoming; VFIOUserMsgQ outgoing; VFIOUserMsg *last_nowait; + VFIOUserMsg *part_recv; + size_t recv_left; enum proxy_state state; } VFIOUserProxy; =20 @@ -74,5 +79,8 @@ typedef struct VFIOUserProxy { =20 VFIOUserProxy *vfio_user_connect_dev(SocketAddress *addr, Error **errp); void vfio_user_disconnect(VFIOUserProxy *proxy); +void vfio_user_set_handler(VFIODevice *vbasedev, + void (*handler)(void *opaque, VFIOUserMsg *msg), + void *reqarg); =20 #endif /* VFIO_USER_H */ diff --git a/hw/vfio/user-pci.c b/hw/vfio/user-pci.c index a3fc36d..8b4e3ea 100644 --- a/hw/vfio/user-pci.c +++ b/hw/vfio/user-pci.c @@ -43,6 +43,16 @@ struct VFIOUserPCIDevice { }; =20 /* + * Incoming request message callback. + * + * Runs off main loop, so BQL held. + */ +static void vfio_user_pci_process_req(void *opaque, VFIOUserMsg *msg) +{ + +} + +/* * Emulated devices don't use host hot reset */ static void vfio_user_compute_needs_reset(VFIODevice *vbasedev) @@ -88,6 +98,7 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Error = **errp) return; } vbasedev->proxy =3D proxy; + vfio_user_set_handler(vbasedev, vfio_user_pci_process_req, vdev); =20 vbasedev->name =3D g_strdup_printf("VFIO user <%s>", udev->sock_name); vbasedev->ops =3D &vfio_user_pci_ops; diff --git a/hw/vfio/user.c b/hw/vfio/user.c index 3d4b0cc..f20e196 100644 --- a/hw/vfio/user.c +++ b/hw/vfio/user.c @@ -24,11 +24,27 @@ #include "io/channel-util.h" #include "sysemu/iothread.h" #include "user.h" +#include "trace.h" =20 static IOThread *vfio_user_iothread; =20 static void vfio_user_shutdown(VFIOUserProxy *proxy); +static VFIOUserMsg *vfio_user_getmsg(VFIOUserProxy *proxy, VFIOUserHdr *hd= r, + VFIOUserFDs *fds); +static VFIOUserFDs *vfio_user_getfds(int numfds); +static void vfio_user_recycle(VFIOUserProxy *proxy, VFIOUserMsg *msg); =20 +static void vfio_user_recv(void *opaque); +static int vfio_user_recv_one(VFIOUserProxy *proxy); +static void vfio_user_cb(void *opaque); + +static void vfio_user_request(void *opaque); + +static inline void vfio_user_set_error(VFIOUserHdr *hdr, uint32_t err) +{ + hdr->flags |=3D VFIO_USER_ERROR; + hdr->error_reply =3D err; +} =20 /* * Functions called by main, CPU, or iothread threads @@ -40,10 +56,340 @@ static void vfio_user_shutdown(VFIOUserProxy *proxy) qio_channel_set_aio_fd_handler(proxy->ioc, proxy->ctx, NULL, NULL, NUL= L); } =20 +static VFIOUserMsg *vfio_user_getmsg(VFIOUserProxy *proxy, VFIOUserHdr *hd= r, + VFIOUserFDs *fds) +{ + VFIOUserMsg *msg; + + msg =3D QTAILQ_FIRST(&proxy->free); + if (msg !=3D NULL) { + QTAILQ_REMOVE(&proxy->free, msg, next); + } else { + msg =3D g_malloc0(sizeof(*msg)); + qemu_cond_init(&msg->cv); + } + + msg->hdr =3D hdr; + msg->fds =3D fds; + return msg; +} + +/* + * Recycle a message list entry to the free list. + */ +static void vfio_user_recycle(VFIOUserProxy *proxy, VFIOUserMsg *msg) +{ + if (msg->type =3D=3D VFIO_MSG_NONE) { + error_printf("vfio_user_recycle - freeing free msg\n"); + return; + } + + /* free msg buffer if no one is waiting to consume the reply */ + if (msg->type =3D=3D VFIO_MSG_NOWAIT || msg->type =3D=3D VFIO_MSG_ASYN= C) { + g_free(msg->hdr); + if (msg->fds !=3D NULL) { + g_free(msg->fds); + } + } + + msg->type =3D VFIO_MSG_NONE; + msg->hdr =3D NULL; + msg->fds =3D NULL; + msg->complete =3D false; + QTAILQ_INSERT_HEAD(&proxy->free, msg, next); +} + +static VFIOUserFDs *vfio_user_getfds(int numfds) +{ + VFIOUserFDs *fds =3D g_malloc0(sizeof(*fds) + (numfds * sizeof(int))); + + fds->fds =3D (int *)((char *)fds + sizeof(*fds)); + + return fds; +} + /* * Functions only called by iothread */ =20 +/* + * Process a received message. + */ +static void vfio_user_process(VFIOUserProxy *proxy, VFIOUserMsg *msg, + bool isreply) +{ + + /* + * Replies signal a waiter, if none just check for errors + * and free the message buffer. + * + * Requests get queued for the BH. + */ + if (isreply) { + msg->complete =3D true; + if (msg->type =3D=3D VFIO_MSG_WAIT) { + qemu_cond_signal(&msg->cv); + } else { + if (msg->hdr->flags & VFIO_USER_ERROR) { + error_printf("vfio_user_process: error reply on async "); + error_printf("request command %x error %s\n", + msg->hdr->command, + strerror(msg->hdr->error_reply)); + } + /* youngest nowait msg has been ack'd */ + if (proxy->last_nowait =3D=3D msg) { + proxy->last_nowait =3D NULL; + } + vfio_user_recycle(proxy, msg); + } + } else { + QTAILQ_INSERT_TAIL(&proxy->incoming, msg, next); + qemu_bh_schedule(proxy->req_bh); + } +} + +/* + * Complete a partial message read + */ +static int vfio_user_complete(VFIOUserProxy *proxy, Error **errp) +{ + VFIOUserMsg *msg =3D proxy->part_recv; + size_t msgleft =3D proxy->recv_left; + bool isreply; + char *data; + int ret; + + data =3D (char *)msg->hdr + (msg->hdr->size - msgleft); + while (msgleft > 0) { + ret =3D qio_channel_read(proxy->ioc, data, msgleft, errp); + + /* error or would block */ + if (ret <=3D 0) { + /* try for rest on next iternation */ + if (ret =3D=3D QIO_CHANNEL_ERR_BLOCK) { + proxy->recv_left =3D msgleft; + } + return ret; + } + trace_vfio_user_recv_read(msg->hdr->id, ret); + + msgleft -=3D ret; + data +=3D ret; + } + + /* + * Read complete message, process it. + */ + proxy->part_recv =3D NULL; + proxy->recv_left =3D 0; + isreply =3D (msg->hdr->flags & VFIO_USER_TYPE) =3D=3D VFIO_USER_REPLY; + vfio_user_process(proxy, msg, isreply); + + /* return positive value */ + return 1; +} + +static void vfio_user_recv(void *opaque) +{ + VFIOUserProxy *proxy =3D opaque; + + QEMU_LOCK_GUARD(&proxy->lock); + + if (proxy->state =3D=3D VFIO_PROXY_CONNECTED) { + while (vfio_user_recv_one(proxy) =3D=3D 0) { + ; + } + } +} + +/* + * Receive and process one incoming message. + * + * For replies, find matching outgoing request and wake any waiters. + * For requests, queue in incoming list and run request BH. + */ +static int vfio_user_recv_one(VFIOUserProxy *proxy) +{ + VFIOUserMsg *msg =3D NULL; + g_autofree int *fdp =3D NULL; + VFIOUserFDs *reqfds; + VFIOUserHdr hdr; + struct iovec iov =3D { + .iov_base =3D &hdr, + .iov_len =3D sizeof(hdr), + }; + bool isreply =3D false; + int i, ret; + size_t msgleft, numfds =3D 0; + char *data =3D NULL; + char *buf =3D NULL; + Error *local_err =3D NULL; + + /* + * Complete any partial reads + */ + if (proxy->part_recv !=3D NULL) { + ret =3D vfio_user_complete(proxy, &local_err); + + /* still not complete, try later */ + if (ret =3D=3D QIO_CHANNEL_ERR_BLOCK) { + return ret; + } + + if (ret <=3D 0) { + goto fatal; + } + /* else fall into reading another msg */ + } + + /* + * Read header + */ + ret =3D qio_channel_readv_full(proxy->ioc, &iov, 1, &fdp, &numfds, + &local_err); + if (ret =3D=3D QIO_CHANNEL_ERR_BLOCK) { + return ret; + } + + /* read error or other side closed connection */ + if (ret <=3D 0) { + goto fatal; + } + + if (ret < sizeof(msg)) { + error_setg(&local_err, "short read of header"); + goto fatal; + } + + /* + * Validate header + */ + if (hdr.size < sizeof(VFIOUserHdr)) { + error_setg(&local_err, "bad header size"); + goto fatal; + } + switch (hdr.flags & VFIO_USER_TYPE) { + case VFIO_USER_REQUEST: + isreply =3D false; + break; + case VFIO_USER_REPLY: + isreply =3D true; + break; + default: + error_setg(&local_err, "unknown message type"); + goto fatal; + } + trace_vfio_user_recv_hdr(proxy->sockname, hdr.id, hdr.command, hdr.siz= e, + hdr.flags); + + /* + * For replies, find the matching pending request. + * For requests, reap incoming FDs. + */ + if (isreply) { + QTAILQ_FOREACH(msg, &proxy->pending, next) { + if (hdr.id =3D=3D msg->id) { + break; + } + } + if (msg =3D=3D NULL) { + error_setg(&local_err, "unexpected reply"); + goto err; + } + QTAILQ_REMOVE(&proxy->pending, msg, next); + + /* + * Process any received FDs + */ + if (numfds !=3D 0) { + if (msg->fds =3D=3D NULL || msg->fds->recv_fds < numfds) { + error_setg(&local_err, "unexpected FDs"); + goto err; + } + msg->fds->recv_fds =3D numfds; + memcpy(msg->fds->fds, fdp, numfds * sizeof(int)); + } + } else { + if (numfds !=3D 0) { + reqfds =3D vfio_user_getfds(numfds); + memcpy(reqfds->fds, fdp, numfds * sizeof(int)); + } else { + reqfds =3D NULL; + } + } + + /* + * Put the whole message into a single buffer. + */ + if (isreply) { + if (hdr.size > msg->rsize) { + error_setg(&local_err, "reply larger than recv buffer"); + goto err; + } + *msg->hdr =3D hdr; + data =3D (char *)msg->hdr + sizeof(hdr); + } else { + buf =3D g_malloc0(hdr.size); + memcpy(buf, &hdr, sizeof(hdr)); + data =3D buf + sizeof(hdr); + msg =3D vfio_user_getmsg(proxy, (VFIOUserHdr *)buf, reqfds); + msg->type =3D VFIO_MSG_REQ; + } + + /* + * Read rest of message. + */ + msgleft =3D hdr.size - sizeof(hdr); + while (msgleft > 0) { + ret =3D qio_channel_read(proxy->ioc, data, msgleft, &local_err); + + /* prepare to complete read on next iternation */ + if (ret =3D=3D QIO_CHANNEL_ERR_BLOCK) { + proxy->part_recv =3D msg; + proxy->recv_left =3D msgleft; + return ret; + } + + if (ret <=3D 0) { + goto fatal; + } + trace_vfio_user_recv_read(hdr.id, ret); + + msgleft -=3D ret; + data +=3D ret; + } + + vfio_user_process(proxy, msg, isreply); + return 0; + + /* + * fatal means the other side closed or we don't trust the stream + * err means this message is corrupt + */ +fatal: + vfio_user_shutdown(proxy); + proxy->state =3D VFIO_PROXY_ERROR; + + /* set error if server side closed */ + if (ret =3D=3D 0) { + error_setg(&local_err, "server closed socket"); + } + +err: + for (i =3D 0; i < numfds; i++) { + close(fdp[i]); + } + if (isreply && msg !=3D NULL) { + /* force an error to keep sending thread from hanging */ + vfio_user_set_error(msg->hdr, EINVAL); + msg->complete =3D true; + qemu_cond_signal(&msg->cv); + } + error_prepend(&local_err, "vfio_user_recv_one: "); + error_report_err(local_err); + return -1; +} + static void vfio_user_cb(void *opaque) { VFIOUserProxy *proxy =3D opaque; @@ -59,6 +405,53 @@ static void vfio_user_cb(void *opaque) * Functions called by main or CPU threads */ =20 +/* + * Process incoming requests. + * + * The bus-specific callback has the form: + * request(opaque, msg) + * where 'opaque' was specified in vfio_user_set_handler + * and 'msg' is the inbound message. + * + * The callback is responsible for disposing of the message buffer, + * usually by re-using it when calling vfio_send_reply or vfio_send_error, + * both of which free their message buffer when the reply is sent. + * + * If the callback uses a new buffer, it needs to free the old one. + */ +static void vfio_user_request(void *opaque) +{ + VFIOUserProxy *proxy =3D opaque; + VFIOUserMsgQ new, free; + VFIOUserMsg *msg, *m1; + + /* reap all incoming */ + QTAILQ_INIT(&new); + WITH_QEMU_LOCK_GUARD(&proxy->lock) { + QTAILQ_FOREACH_SAFE(msg, &proxy->incoming, next, m1) { + QTAILQ_REMOVE(&proxy->incoming, msg, next); + QTAILQ_INSERT_TAIL(&new, msg, next); + } + } + + /* process list */ + QTAILQ_INIT(&free); + QTAILQ_FOREACH_SAFE(msg, &new, next, m1) { + QTAILQ_REMOVE(&new, msg, next); + trace_vfio_user_recv_request(msg->hdr->command); + proxy->request(proxy->req_arg, msg); + QTAILQ_INSERT_HEAD(&free, msg, next); + } + + /* free list */ + WITH_QEMU_LOCK_GUARD(&proxy->lock) { + QTAILQ_FOREACH_SAFE(msg, &free, next, m1) { + vfio_user_recycle(proxy, msg); + } + } +} + + static QLIST_HEAD(, VFIOUserProxy) vfio_user_sockets =3D QLIST_HEAD_INITIALIZER(vfio_user_sockets); =20 @@ -97,6 +490,7 @@ VFIOUserProxy *vfio_user_connect_dev(SocketAddress *addr= , Error **errp) } =20 proxy->ctx =3D iothread_get_aio_context(vfio_user_iothread); + proxy->req_bh =3D qemu_bh_new(vfio_user_request, proxy); =20 QTAILQ_INIT(&proxy->outgoing); QTAILQ_INIT(&proxy->incoming); @@ -107,6 +501,18 @@ VFIOUserProxy *vfio_user_connect_dev(SocketAddress *ad= dr, Error **errp) return proxy; } =20 +void vfio_user_set_handler(VFIODevice *vbasedev, + void (*handler)(void *opaque, VFIOUserMsg *msg), + void *req_arg) +{ + VFIOUserProxy *proxy =3D vbasedev->proxy; + + proxy->request =3D handler; + proxy->req_arg =3D req_arg; + qio_channel_set_aio_fd_handler(proxy->ioc, proxy->ctx, + vfio_user_recv, NULL, proxy); +} + void vfio_user_disconnect(VFIOUserProxy *proxy) { VFIOUserMsg *r1, *r2; @@ -122,6 +528,8 @@ void vfio_user_disconnect(VFIOUserProxy *proxy) } object_unref(OBJECT(proxy->ioc)); proxy->ioc =3D NULL; + qemu_bh_delete(proxy->req_bh); + proxy->req_bh =3D NULL; =20 proxy->state =3D VFIO_PROXY_CLOSING; QTAILQ_FOREACH_SAFE(r1, &proxy->outgoing, next, r2) { diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index 73dffe9..73cc121 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -166,3 +166,8 @@ vfio_load_state_device_data(const char *name, uint64_t = data_offset, uint64_t dat vfio_load_cleanup(const char *name) " (%s)" vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitma= p_size, uint64_t start) "container fd=3D%d, iova=3D0x%"PRIx64" size=3D 0x%"= PRIx64" bitmap_size=3D0x%"PRIx64" start=3D0x%"PRIx64 vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu= dirty @ 0x%"PRIx64" - 0x%"PRIx64 + +# user.c +vfio_user_recv_hdr(const char *name, uint16_t id, uint16_t cmd, uint32_t s= ize, uint32_t flags) " (%s) id 0x%x cmd 0x%x size 0x%x flags 0x%x" +vfio_user_recv_read(uint16_t id, int read) " id 0x%x read 0x%x" +vfio_user_recv_request(uint16_t cmd) " command 0x%x" --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675316756; cv=none; d=zohomail.com; s=zohoarc; b=U/QB7JQ+jYamvt2JAveCDT+r+Nv9SyQh0yma26OqP0RBCKyIVQXtplwrHTtSQkBWcv1OyA099Cp8tpvNmeaIa0HIqvKbWja9tatWUiXnr9XddcnzmWPD3uyr+Diz5lG8Enb06PQPi/isIZIJ/eXwLZgWN/GMeXwGhTNV2CEOEs0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675316756; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=2+zDXz6QPde4A06i3DaOlECdJa5HA5ZnYdSWnP25rAo=; b=F1R/bhaqSjAKKZdD0iKOlXE5Vpp0hzkLVsFLMYLaZdSWTpDMXk4Z/lDeJxyNJG8FfW2hZeRbWHgjjrcyPT4pdA/Ow2+Jgo4yFCpDq+RLF55pfZkqPi0u96Swktdd1p5cHWS7ox3OJckRVTbqj+ooUOxKK8ap1RkUxchfpzjEDrk= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1675316756226511.5135801738403; Wed, 1 Feb 2023 21:45:56 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSPK-0008At-DQ; Thu, 02 Feb 2023 00:45:14 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPG-00088s-0o for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:11 -0500 Received: from mx0a-00069f02.pphosted.com ([205.220.165.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPC-00079U-TG for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:09 -0500 Received: from pps.filterd (m0333521.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124in3F023629; Thu, 2 Feb 2023 05:45:04 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfq28syxp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:45:04 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254Ppd013015; Thu, 2 Feb 2023 05:45:02 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-10 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:45:02 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=2+zDXz6QPde4A06i3DaOlECdJa5HA5ZnYdSWnP25rAo=; b=vE9y0T7530pFdA31WAitnEgUPNVZtJ+mfMthVtQ3741hCUiBWOaR44LR5+/iFdXbqW/c DJ7RasULQYhP/hC3yPz91vqjeeZDolfCbtqSe4kO7NaBD+TcfKxPMuED0gZBpapWFnEW X86+9bLcq1jMdkG++U0g8SgxIP4LkpGV4f3rabNUVxN6HwNVPImUEbC+sibnrOWnXKcy mhbjZkJK3UdPNuPf2DtqVlMNaRFNTF70NXvIUhFBc62GBTiR9EN4HNXAOVx4xu6xbU2w K6bX+iHCWsIcboT4paxs/c3kZ9Xk5f3ne73lO5L5yinrc7wqx1prq6OJpIsNftHCf9gw jw== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 09/23] vfio-user: define socket send functions Date: Wed, 1 Feb 2023 21:55:45 -0800 Message-Id: <1bb359101a9396f480463cda3269fbd9851a114a.1675228037.git.john.g.johnson@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-GUID: F7Q2cYq3PfnuiL5W6F-2yW9nPbQdcdEj X-Proofpoint-ORIG-GUID: F7Q2cYq3PfnuiL5W6F-2yW9nPbQdcdEj Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.165.32; envelope-from=john.g.johnson@oracle.com; helo=mx0a-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675316757496100003 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Also negotiate protocol version with remote server Signed-off-by: Jagannathan Raman Signed-off-by: Elena Ufimtseva Signed-off-by: John G Johnson --- hw/vfio/user-protocol.h | 62 ++++++ hw/vfio/user.h | 9 + hw/vfio/user-pci.c | 16 ++ hw/vfio/user.c | 512 ++++++++++++++++++++++++++++++++++++++++++++= ++++ hw/vfio/trace-events | 2 + 5 files changed, 601 insertions(+) diff --git a/hw/vfio/user-protocol.h b/hw/vfio/user-protocol.h index d23877c..5de5b20 100644 --- a/hw/vfio/user-protocol.h +++ b/hw/vfio/user-protocol.h @@ -51,4 +51,66 @@ enum vfio_user_command { #define VFIO_USER_NO_REPLY 0x10 #define VFIO_USER_ERROR 0x20 =20 + +/* + * VFIO_USER_VERSION + */ +typedef struct { + VFIOUserHdr hdr; + uint16_t major; + uint16_t minor; + char capabilities[]; +} VFIOUserVersion; + +#define VFIO_USER_MAJOR_VER 0 +#define VFIO_USER_MINOR_VER 0 + +#define VFIO_USER_CAP "capabilities" + +/* "capabilities" members */ +#define VFIO_USER_CAP_MAX_FDS "max_msg_fds" +#define VFIO_USER_CAP_MAX_XFER "max_data_xfer_size" +#define VFIO_USER_CAP_PGSIZES "pgsizes" +#define VFIO_USER_CAP_MAP_MAX "max_dma_maps" +#define VFIO_USER_CAP_MIGR "migration" + +/* "migration" members */ +#define VFIO_USER_CAP_PGSIZE "pgsize" +#define VFIO_USER_CAP_MAX_BITMAP "max_bitmap_size" + +/* + * Max FDs mainly comes into play when a device supports multiple interrup= ts + * where each ones uses an eventfd to inject it into the guest. + * It is clamped by the the number of FDs the qio channel supports in a + * single message. + */ +#define VFIO_USER_DEF_MAX_FDS 8 +#define VFIO_USER_MAX_MAX_FDS 16 + +/* + * Max transfer limits the amount of data in region and DMA messages. + * Region R/W will be very small (limited by how much a single instruction + * can process) so just use a reasonable limit here. + */ +#define VFIO_USER_DEF_MAX_XFER (1024 * 1024) +#define VFIO_USER_MAX_MAX_XFER (64 * 1024 * 1024) + +/* + * Default pagesizes supported is 4k. + */ +#define VFIO_USER_DEF_PGSIZE 4096 + +/* + * Default max number of DMA mappings is stolen from the + * linux kernel "dma_entry_limit" + */ +#define VFIO_USER_DEF_MAP_MAX 65535 + +/* + * Default max bitmap size is also take from the linux kernel, + * where usage of signed ints limits the VA range to 2^31 bytes. + * Dividing that by the number of bits per byte yields 256MB + */ +#define VFIO_USER_DEF_MAX_BITMAP (256 * 1024 * 1024) + #endif /* VFIO_USER_PROTOCOL_H */ diff --git a/hw/vfio/user.h b/hw/vfio/user.h index 5259a30..038e5e3 100644 --- a/hw/vfio/user.h +++ b/hw/vfio/user.h @@ -35,6 +35,7 @@ typedef struct VFIOUserMsg { uint32_t id; QemuCond cv; bool complete; + bool pending; enum msg_type type; } VFIOUserMsg; =20 @@ -54,6 +55,12 @@ typedef struct VFIOUserProxy { struct QIOChannel *ioc; void (*request)(void *opaque, VFIOUserMsg *msg); void *req_arg; + uint64_t max_xfer_size; + uint64_t max_send_fds; + uint64_t max_dma; + uint64_t dma_pgsizes; + uint64_t max_bitmap; + uint64_t migr_pgsize; int flags; QemuCond close_cv; AioContext *ctx; @@ -76,11 +83,13 @@ typedef struct VFIOUserProxy { =20 /* VFIOProxy flags */ #define VFIO_PROXY_CLIENT 0x1 +#define VFIO_PROXY_FORCE_QUEUED 0x4 =20 VFIOUserProxy *vfio_user_connect_dev(SocketAddress *addr, Error **errp); void vfio_user_disconnect(VFIOUserProxy *proxy); void vfio_user_set_handler(VFIODevice *vbasedev, void (*handler)(void *opaque, VFIOUserMsg *msg), void *reqarg); +int vfio_user_validate_version(VFIOUserProxy *proxy, Error **errp); =20 #endif /* VFIO_USER_H */ diff --git a/hw/vfio/user-pci.c b/hw/vfio/user-pci.c index 8b4e3ea..0fe2593 100644 --- a/hw/vfio/user-pci.c +++ b/hw/vfio/user-pci.c @@ -40,6 +40,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(VFIOUserPCIDevice, VFIO_USER_P= CI) struct VFIOUserPCIDevice { VFIOPCIDevice device; char *sock_name; + bool send_queued; /* all sends are queued */ }; =20 /* @@ -100,11 +101,25 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Er= ror **errp) vbasedev->proxy =3D proxy; vfio_user_set_handler(vbasedev, vfio_user_pci_process_req, vdev); =20 + if (udev->send_queued) { + proxy->flags |=3D VFIO_PROXY_FORCE_QUEUED; + } + + vfio_user_validate_version(proxy, &err); + if (err !=3D NULL) { + error_propagate(errp, err); + goto error; + } + vbasedev->name =3D g_strdup_printf("VFIO user <%s>", udev->sock_name); vbasedev->ops =3D &vfio_user_pci_ops; vbasedev->type =3D VFIO_DEVICE_TYPE_PCI; vbasedev->dev =3D DEVICE(vdev); =20 + return; + +error: + error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name); } =20 static void vfio_user_instance_finalize(Object *obj) @@ -121,6 +136,7 @@ static void vfio_user_instance_finalize(Object *obj) =20 static Property vfio_user_pci_dev_properties[] =3D { DEFINE_PROP_STRING("socket", VFIOUserPCIDevice, sock_name), + DEFINE_PROP_BOOL("x-send-queued", VFIOUserPCIDevice, send_queued, fals= e), DEFINE_PROP_END_OF_LIST(), }; =20 diff --git a/hw/vfio/user.c b/hw/vfio/user.c index f20e196..2d60f99 100644 --- a/hw/vfio/user.c +++ b/hw/vfio/user.c @@ -23,12 +23,20 @@ #include "io/channel-socket.h" #include "io/channel-util.h" #include "sysemu/iothread.h" +#include "qapi/qmp/qdict.h" +#include "qapi/qmp/qjson.h" +#include "qapi/qmp/qnull.h" +#include "qapi/qmp/qstring.h" +#include "qapi/qmp/qnum.h" +#include "qapi/qmp/qbool.h" #include "user.h" #include "trace.h" =20 +static int wait_time =3D 5000; /* wait up to 5 sec for busy servers */ static IOThread *vfio_user_iothread; =20 static void vfio_user_shutdown(VFIOUserProxy *proxy); +static int vfio_user_send_qio(VFIOUserProxy *proxy, VFIOUserMsg *msg); static VFIOUserMsg *vfio_user_getmsg(VFIOUserProxy *proxy, VFIOUserHdr *hd= r, VFIOUserFDs *fds); static VFIOUserFDs *vfio_user_getfds(int numfds); @@ -36,9 +44,16 @@ static void vfio_user_recycle(VFIOUserProxy *proxy, VFIO= UserMsg *msg); =20 static void vfio_user_recv(void *opaque); static int vfio_user_recv_one(VFIOUserProxy *proxy); +static void vfio_user_send(void *opaque); +static int vfio_user_send_one(VFIOUserProxy *proxy); static void vfio_user_cb(void *opaque); =20 static void vfio_user_request(void *opaque); +static int vfio_user_send_queued(VFIOUserProxy *proxy, VFIOUserMsg *msg); +static void vfio_user_send_wait(VFIOUserProxy *proxy, VFIOUserHdr *hdr, + VFIOUserFDs *fds, int rsize, bool nobql); +static void vfio_user_request_msg(VFIOUserHdr *hdr, uint16_t cmd, + uint32_t size, uint32_t flags); =20 static inline void vfio_user_set_error(VFIOUserHdr *hdr, uint32_t err) { @@ -56,6 +71,35 @@ static void vfio_user_shutdown(VFIOUserProxy *proxy) qio_channel_set_aio_fd_handler(proxy->ioc, proxy->ctx, NULL, NULL, NUL= L); } =20 +static int vfio_user_send_qio(VFIOUserProxy *proxy, VFIOUserMsg *msg) +{ + VFIOUserFDs *fds =3D msg->fds; + struct iovec iov =3D { + .iov_base =3D msg->hdr, + .iov_len =3D msg->hdr->size, + }; + size_t numfds =3D 0; + int ret, *fdp =3D NULL; + Error *local_err =3D NULL; + + if (fds !=3D NULL && fds->send_fds !=3D 0) { + numfds =3D fds->send_fds; + fdp =3D fds->fds; + } + + ret =3D qio_channel_writev_full(proxy->ioc, &iov, 1, fdp, numfds, 0, + &local_err); + + if (ret =3D=3D -1) { + vfio_user_set_error(msg->hdr, EIO); + vfio_user_shutdown(proxy); + error_report_err(local_err); + } + trace_vfio_user_send_write(msg->hdr->id, ret); + + return ret; +} + static VFIOUserMsg *vfio_user_getmsg(VFIOUserProxy *proxy, VFIOUserHdr *hd= r, VFIOUserFDs *fds) { @@ -96,6 +140,7 @@ static void vfio_user_recycle(VFIOUserProxy *proxy, VFIO= UserMsg *msg) msg->hdr =3D NULL; msg->fds =3D NULL; msg->complete =3D false; + msg->pending =3D false; QTAILQ_INSERT_HEAD(&proxy->free, msg, next); } =20 @@ -390,6 +435,54 @@ err: return -1; } =20 +/* + * Send messages from outgoing queue when the socket buffer has space. + * If we deplete 'outgoing', remove ourselves from the poll list. + */ +static void vfio_user_send(void *opaque) +{ + VFIOUserProxy *proxy =3D opaque; + + QEMU_LOCK_GUARD(&proxy->lock); + + if (proxy->state =3D=3D VFIO_PROXY_CONNECTED) { + while (!QTAILQ_EMPTY(&proxy->outgoing)) { + if (vfio_user_send_one(proxy) < 0) { + return; + } + } + qio_channel_set_aio_fd_handler(proxy->ioc, proxy->ctx, + vfio_user_recv, NULL, proxy); + } +} + +/* + * Send a single message. + * + * Sent async messages are freed, others are moved to pending queue. + */ +static int vfio_user_send_one(VFIOUserProxy *proxy) +{ + VFIOUserMsg *msg; + int ret; + + msg =3D QTAILQ_FIRST(&proxy->outgoing); + ret =3D vfio_user_send_qio(proxy, msg); + if (ret < 0) { + return ret; + } + + QTAILQ_REMOVE(&proxy->outgoing, msg, next); + if (msg->type =3D=3D VFIO_MSG_ASYNC) { + vfio_user_recycle(proxy, msg); + } else { + QTAILQ_INSERT_TAIL(&proxy->pending, msg, next); + msg->pending =3D true; + } + + return 0; +} + static void vfio_user_cb(void *opaque) { VFIOUserProxy *proxy =3D opaque; @@ -451,6 +544,134 @@ static void vfio_user_request(void *opaque) } } =20 +/* + * Messages are queued onto the proxy's outgoing list. + * + * It handles 3 types of messages: + * + * async messages - replies and posted writes + * + * There will be no reply from the server, so message + * buffers are freed after they're sent. + * + * nowait messages - map/unmap during address space transactions + * + * These are also sent async, but a reply is expected so that + * vfio_wait_reqs() can wait for the youngest nowait request. + * They transition from the outgoing list to the pending list + * when sent, and are freed when the reply is received. + * + * wait messages - all other requests + * + * The reply to these messages is waited for by their caller. + * They also transition from outgoing to pending when sent, but + * the message buffer is returned to the caller with the reply + * contents. The caller is responsible for freeing these messages. + * + * As an optimization, if the outgoing list and the socket send + * buffer are empty, the message is sent inline instead of being + * added to the outgoing list. The rest of the transitions are + * unchanged. + * + * returns 0 if the message was sent or queued + * returns -1 on send error + */ +static int vfio_user_send_queued(VFIOUserProxy *proxy, VFIOUserMsg *msg) +{ + int ret; + + /* + * Unsent outgoing msgs - add to tail + */ + if (!QTAILQ_EMPTY(&proxy->outgoing)) { + QTAILQ_INSERT_TAIL(&proxy->outgoing, msg, next); + return 0; + } + + /* + * Try inline - if blocked, queue it and kick send poller + */ + if (proxy->flags & VFIO_PROXY_FORCE_QUEUED) { + ret =3D QIO_CHANNEL_ERR_BLOCK; + } else { + ret =3D vfio_user_send_qio(proxy, msg); + } + if (ret =3D=3D QIO_CHANNEL_ERR_BLOCK) { + QTAILQ_INSERT_HEAD(&proxy->outgoing, msg, next); + qio_channel_set_aio_fd_handler(proxy->ioc, proxy->ctx, + vfio_user_recv, vfio_user_send, + proxy); + return 0; + } + if (ret =3D=3D -1) { + return ret; + } + + /* + * Sent - free async, add others to pending + */ + if (msg->type =3D=3D VFIO_MSG_ASYNC) { + vfio_user_recycle(proxy, msg); + } else { + QTAILQ_INSERT_TAIL(&proxy->pending, msg, next); + msg->pending =3D true; + } + + return 0; +} + +static void vfio_user_send_wait(VFIOUserProxy *proxy, VFIOUserHdr *hdr, + VFIOUserFDs *fds, int rsize, bool nobql) +{ + VFIOUserMsg *msg; + bool iolock =3D false; + int ret; + + if (hdr->flags & VFIO_USER_NO_REPLY) { + error_printf("vfio_user_send_wait on async message\n"); + vfio_user_set_error(hdr, EINVAL); + return; + } + + /* + * We may block later, so use a per-proxy lock and drop + * BQL while we sleep unless 'nobql' says not to. + */ + qemu_mutex_lock(&proxy->lock); + if (!nobql) { + iolock =3D qemu_mutex_iothread_locked(); + if (iolock) { + qemu_mutex_unlock_iothread(); + } + } + + msg =3D vfio_user_getmsg(proxy, hdr, fds); + msg->id =3D hdr->id; + msg->rsize =3D rsize ? rsize : hdr->size; + msg->type =3D VFIO_MSG_WAIT; + + ret =3D vfio_user_send_queued(proxy, msg); + + if (ret =3D=3D 0) { + while (!msg->complete) { + if (!qemu_cond_timedwait(&msg->cv, &proxy->lock, wait_time)) { + VFIOUserMsgQ *list; + + list =3D msg->pending ? &proxy->pending : &proxy->outgoing; + QTAILQ_REMOVE(list, msg, next); + vfio_user_set_error(hdr, ETIMEDOUT); + break; + } + } + } + vfio_user_recycle(proxy, msg); + + /* lock order is BQL->proxy - don't hold proxy when getting BQL */ + qemu_mutex_unlock(&proxy->lock); + if (iolock) { + qemu_mutex_lock_iothread(); + } +} =20 static QLIST_HEAD(, VFIOUserProxy) vfio_user_sockets =3D QLIST_HEAD_INITIALIZER(vfio_user_sockets); @@ -479,6 +700,15 @@ VFIOUserProxy *vfio_user_connect_dev(SocketAddress *ad= dr, Error **errp) proxy =3D g_malloc0(sizeof(VFIOUserProxy)); proxy->sockname =3D g_strdup_printf("unix:%s", sockname); proxy->ioc =3D ioc; + + /* init defaults */ + proxy->max_xfer_size =3D VFIO_USER_DEF_MAX_XFER; + proxy->max_send_fds =3D VFIO_USER_DEF_MAX_FDS; + proxy->max_dma =3D VFIO_USER_DEF_MAP_MAX; + proxy->dma_pgsizes =3D VFIO_USER_DEF_PGSIZE; + proxy->max_bitmap =3D VFIO_USER_DEF_MAX_BITMAP; + proxy->migr_pgsize =3D VFIO_USER_DEF_PGSIZE; + proxy->flags =3D VFIO_PROXY_CLIENT; proxy->state =3D VFIO_PROXY_CONNECTED; =20 @@ -576,3 +806,285 @@ void vfio_user_disconnect(VFIOUserProxy *proxy) g_free(proxy->sockname); g_free(proxy); } + +static void vfio_user_request_msg(VFIOUserHdr *hdr, uint16_t cmd, + uint32_t size, uint32_t flags) +{ + static uint16_t next_id; + + hdr->id =3D qatomic_fetch_inc(&next_id); + hdr->command =3D cmd; + hdr->size =3D size; + hdr->flags =3D (flags & ~VFIO_USER_TYPE) | VFIO_USER_REQUEST; + hdr->error_reply =3D 0; +} + +struct cap_entry { + const char *name; + int (*check)(VFIOUserProxy *proxy, QObject *qobj, Error **errp); +}; + +static int caps_parse(VFIOUserProxy *proxy, QDict *qdict, + struct cap_entry caps[], Error **errp) +{ + QObject *qobj; + struct cap_entry *p; + + for (p =3D caps; p->name !=3D NULL; p++) { + qobj =3D qdict_get(qdict, p->name); + if (qobj !=3D NULL) { + if (p->check(proxy, qobj, errp)) { + return -1; + } + qdict_del(qdict, p->name); + } + } + + /* warning, for now */ + if (qdict_size(qdict) !=3D 0) { + warn_report("spurious capabilities"); + } + return 0; +} + +static int check_migr_pgsize(VFIOUserProxy *proxy, QObject *qobj, Error **= errp) +{ + QNum *qn =3D qobject_to(QNum, qobj); + uint64_t pgsize; + + if (qn =3D=3D NULL || !qnum_get_try_uint(qn, &pgsize)) { + error_setg(errp, "malformed %s", VFIO_USER_CAP_PGSIZE); + return -1; + } + + /* must be larger than default */ + if (pgsize & (VFIO_USER_DEF_PGSIZE - 1)) { + error_setg(errp, "pgsize 0x%"PRIx64" too small", pgsize); + return -1; + } + + proxy->migr_pgsize =3D pgsize; + return 0; +} + +static int check_bitmap(VFIOUserProxy *proxy, QObject *qobj, Error **errp) +{ + QNum *qn =3D qobject_to(QNum, qobj); + uint64_t bitmap_size; + + if (qn =3D=3D NULL || !qnum_get_try_uint(qn, &bitmap_size)) { + error_setg(errp, "malformed %s", VFIO_USER_CAP_MAX_BITMAP); + return -1; + } + + /* can only lower it */ + if (bitmap_size > VFIO_USER_DEF_MAX_BITMAP) { + error_setg(errp, "%s too large", VFIO_USER_CAP_MAX_BITMAP); + return -1; + } + + proxy->max_bitmap =3D bitmap_size; + return 0; +} + +static struct cap_entry caps_migr[] =3D { + { VFIO_USER_CAP_PGSIZE, check_migr_pgsize }, + { VFIO_USER_CAP_MAX_BITMAP, check_bitmap }, + { NULL } +}; + +static int check_max_fds(VFIOUserProxy *proxy, QObject *qobj, Error **errp) +{ + QNum *qn =3D qobject_to(QNum, qobj); + uint64_t max_send_fds; + + if (qn =3D=3D NULL || !qnum_get_try_uint(qn, &max_send_fds) || + max_send_fds > VFIO_USER_MAX_MAX_FDS) { + error_setg(errp, "malformed %s", VFIO_USER_CAP_MAX_FDS); + return -1; + } + proxy->max_send_fds =3D max_send_fds; + return 0; +} + +static int check_max_xfer(VFIOUserProxy *proxy, QObject *qobj, Error **err= p) +{ + QNum *qn =3D qobject_to(QNum, qobj); + uint64_t max_xfer_size; + + if (qn =3D=3D NULL || !qnum_get_try_uint(qn, &max_xfer_size) || + max_xfer_size > VFIO_USER_MAX_MAX_XFER) { + error_setg(errp, "malformed %s", VFIO_USER_CAP_MAX_XFER); + return -1; + } + proxy->max_xfer_size =3D max_xfer_size; + return 0; +} + +static int check_pgsizes(VFIOUserProxy *proxy, QObject *qobj, Error **errp) +{ + QNum *qn =3D qobject_to(QNum, qobj); + uint64_t pgsizes; + + if (qn =3D=3D NULL || !qnum_get_try_uint(qn, &pgsizes)) { + error_setg(errp, "malformed %s", VFIO_USER_CAP_PGSIZES); + return -1; + } + + /* must be larger than default */ + if (pgsizes & (VFIO_USER_DEF_PGSIZE - 1)) { + error_setg(errp, "pgsize 0x%"PRIx64" too small", pgsizes); + return -1; + } + + proxy->dma_pgsizes =3D pgsizes; + return 0; +} + +static int check_max_dma(VFIOUserProxy *proxy, QObject *qobj, Error **errp) +{ + QNum *qn =3D qobject_to(QNum, qobj); + uint64_t max_dma; + + if (qn =3D=3D NULL || !qnum_get_try_uint(qn, &max_dma)) { + error_setg(errp, "malformed %s", VFIO_USER_CAP_MAP_MAX); + return -1; + } + + /* can only lower it */ + if (max_dma > VFIO_USER_DEF_MAP_MAX) { + error_setg(errp, "%s too large", VFIO_USER_CAP_MAP_MAX); + return -1; + } + + proxy->max_dma =3D max_dma; + return 0; +} + +static int check_migr(VFIOUserProxy *proxy, QObject *qobj, Error **errp) +{ + QDict *qdict =3D qobject_to(QDict, qobj); + + if (qdict =3D=3D NULL) { + error_setg(errp, "malformed %s", VFIO_USER_CAP_MAX_FDS); + return -1; + } + return caps_parse(proxy, qdict, caps_migr, errp); +} + +static struct cap_entry caps_cap[] =3D { + { VFIO_USER_CAP_MAX_FDS, check_max_fds }, + { VFIO_USER_CAP_MAX_XFER, check_max_xfer }, + { VFIO_USER_CAP_PGSIZES, check_pgsizes }, + { VFIO_USER_CAP_MAP_MAX, check_max_dma }, + { VFIO_USER_CAP_MIGR, check_migr }, + { NULL } +}; + +static int check_cap(VFIOUserProxy *proxy, QObject *qobj, Error **errp) +{ + QDict *qdict =3D qobject_to(QDict, qobj); + + if (qdict =3D=3D NULL) { + error_setg(errp, "malformed %s", VFIO_USER_CAP); + return -1; + } + return caps_parse(proxy, qdict, caps_cap, errp); +} + +static struct cap_entry ver_0_0[] =3D { + { VFIO_USER_CAP, check_cap }, + { NULL } +}; + +static int caps_check(VFIOUserProxy *proxy, int minor, const char *caps, + Error **errp) +{ + QObject *qobj; + QDict *qdict; + int ret; + + qobj =3D qobject_from_json(caps, NULL); + if (qobj =3D=3D NULL) { + error_setg(errp, "malformed capabilities %s", caps); + return -1; + } + qdict =3D qobject_to(QDict, qobj); + if (qdict =3D=3D NULL) { + error_setg(errp, "capabilities %s not an object", caps); + qobject_unref(qobj); + return -1; + } + ret =3D caps_parse(proxy, qdict, ver_0_0, errp); + + qobject_unref(qobj); + return ret; +} + +static GString *caps_json(void) +{ + QDict *dict =3D qdict_new(); + QDict *capdict =3D qdict_new(); + QDict *migdict =3D qdict_new(); + GString *str; + + qdict_put_int(migdict, VFIO_USER_CAP_PGSIZE, VFIO_USER_DEF_PGSIZE); + qdict_put_int(migdict, VFIO_USER_CAP_MAX_BITMAP, VFIO_USER_DEF_MAX_BIT= MAP); + qdict_put_obj(capdict, VFIO_USER_CAP_MIGR, QOBJECT(migdict)); + + qdict_put_int(capdict, VFIO_USER_CAP_MAX_FDS, VFIO_USER_MAX_MAX_FDS); + qdict_put_int(capdict, VFIO_USER_CAP_MAX_XFER, VFIO_USER_DEF_MAX_XFER); + qdict_put_int(capdict, VFIO_USER_CAP_PGSIZES, VFIO_USER_DEF_PGSIZE); + qdict_put_int(capdict, VFIO_USER_CAP_MAP_MAX, VFIO_USER_DEF_MAP_MAX); + + qdict_put_obj(dict, VFIO_USER_CAP, QOBJECT(capdict)); + + str =3D qobject_to_json(QOBJECT(dict)); + qobject_unref(dict); + return str; +} + +int vfio_user_validate_version(VFIOUserProxy *proxy, Error **errp) +{ + g_autofree VFIOUserVersion *msgp; + GString *caps; + char *reply; + int size, caplen; + + caps =3D caps_json(); + caplen =3D caps->len + 1; + size =3D sizeof(*msgp) + caplen; + msgp =3D g_malloc0(size); + + vfio_user_request_msg(&msgp->hdr, VFIO_USER_VERSION, size, 0); + msgp->major =3D VFIO_USER_MAJOR_VER; + msgp->minor =3D VFIO_USER_MINOR_VER; + memcpy(&msgp->capabilities, caps->str, caplen); + g_string_free(caps, true); + trace_vfio_user_version(msgp->major, msgp->minor, msgp->capabilities); + + vfio_user_send_wait(proxy, &msgp->hdr, NULL, 0, false); + if (msgp->hdr.flags & VFIO_USER_ERROR) { + error_setg_errno(errp, msgp->hdr.error_reply, "version reply"); + return -1; + } + + if (msgp->major !=3D VFIO_USER_MAJOR_VER || + msgp->minor > VFIO_USER_MINOR_VER) { + error_setg(errp, "incompatible server version"); + return -1; + } + + reply =3D msgp->capabilities; + if (reply[msgp->hdr.size - sizeof(*msgp) - 1] !=3D '\0') { + error_setg(errp, "corrupt version reply"); + return -1; + } + + if (caps_check(proxy, msgp->minor, reply, errp) !=3D 0) { + return -1; + } + + trace_vfio_user_version(msgp->major, msgp->minor, msgp->capabilities); + return 0; +} diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index 73cc121..e3640bc 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -171,3 +171,5 @@ vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64= _t iova_end) "iommu dirty vfio_user_recv_hdr(const char *name, uint16_t id, uint16_t cmd, uint32_t s= ize, uint32_t flags) " (%s) id 0x%x cmd 0x%x size 0x%x flags 0x%x" vfio_user_recv_read(uint16_t id, int read) " id 0x%x read 0x%x" vfio_user_recv_request(uint16_t cmd) " command 0x%x" +vfio_user_send_write(uint16_t id, int wrote) " id 0x%x wrote 0x%x" +vfio_user_version(uint16_t major, uint16_t minor, const char *caps) " majo= r %d minor %d caps: %s" --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675317063; cv=none; d=zohomail.com; s=zohoarc; b=QANX3RRdOxuBjLLu7ITgxW2+kr6xTTH2+52cdYusProw+LckeTXmPY+Ltxg/L2HQx8I5vn9SwLf9CMVkMVAxt3HJ6BShTyndCWZUUeyGxcs7XjGgbMtxTKxJfsbspzkhF9gtWnPCDgZI5AfyIhgJIZwgau2hWxICVRDPbyHtWxM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675317063; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=2MyLpRHFvnjahjePJwG1QRaYxN8Mw3IsbhhrOXXY0aA=; b=cnCzdfHGkiY6Ifgi9qpQ3VmZ+25qV5Fslxj7TuNsPmo57v1lY/aEjTcdA8qoQjf8c0hBLeiRklsb4FydrTZiWuuPJJUwR5cKx4bgINqkYs5dB14aAA3NVTKzwOkbpt26Lmvs2mN7NqHEyKixvAEvvB0C8UQu1K05k2sxBeOUDvs= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1675317063060532.6425886851085; Wed, 1 Feb 2023 21:51:03 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSPI-00089R-Vp; Thu, 02 Feb 2023 00:45:13 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPF-00088f-TQ for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:09 -0500 Received: from mx0a-00069f02.pphosted.com ([205.220.165.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPD-00079j-G3 for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:09 -0500 Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124ifgC027422; Thu, 2 Feb 2023 05:45:05 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfpywj0vy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:45:05 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254Ppf013015; Thu, 2 Feb 2023 05:45:03 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-11 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:45:03 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=2MyLpRHFvnjahjePJwG1QRaYxN8Mw3IsbhhrOXXY0aA=; b=f8W3snUW2oOCFWPutG+bxgDtr/E8M8pJdILkmemD66Nv3JxKL3eUShM19In2x1tuR5pN ABBYfMwM40oI5ttv1KCwBmn/d5U60Ssuex4xJ4Gy28OU4tA2SvjvUGsRerrJjWVXT2jp x2zrQ6TNC0DrsZGX1apme0x8FGBWHuaGdw85Z1pTVW1s1ElwThDds6Fi4coEm/7hFx24 gGrI7Ar6fdm4xfQmgYVRYuPCn3GqHYfSqtB2RWyBjoJh7OU7ofPy59u90aZqA4c67uey Ok+gzsyb6k2NXwRyDJJKmiPIS7wgC1ELd8Ftn0aAKSqcSiGkjsnGNRl8X1cpPfpfX3rI jw== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 10/23] vfio-user: get device info Date: Wed, 1 Feb 2023 21:55:46 -0800 Message-Id: <8bc3133469b42eecc3a8af8200b5ce9692e6f3e9.1675228037.git.john.g.johnson@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-GUID: RaVbJov5FYWjNFvk822WJVxOF5orHqSc X-Proofpoint-ORIG-GUID: RaVbJov5FYWjNFvk822WJVxOF5orHqSc Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.165.32; envelope-from=john.g.johnson@oracle.com; helo=mx0a-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675317064657100009 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Signed-off-by: Elena Ufimtseva Signed-off-by: John G Johnson Signed-off-by: Jagannathan Raman --- hw/vfio/user-protocol.h | 12 ++++++++++ hw/vfio/user.h | 1 + include/hw/vfio/vfio-common.h | 3 +++ hw/vfio/common.c | 23 +++++++++++------- hw/vfio/user-pci.c | 6 +++++ hw/vfio/user.c | 55 +++++++++++++++++++++++++++++++++++++++= ++++ hw/vfio/trace-events | 1 + 7 files changed, 93 insertions(+), 8 deletions(-) diff --git a/hw/vfio/user-protocol.h b/hw/vfio/user-protocol.h index 5de5b20..5f9ef17 100644 --- a/hw/vfio/user-protocol.h +++ b/hw/vfio/user-protocol.h @@ -113,4 +113,16 @@ typedef struct { */ #define VFIO_USER_DEF_MAX_BITMAP (256 * 1024 * 1024) =20 +/* + * VFIO_USER_DEVICE_GET_INFO + * imported from struct vfio_device_info + */ +typedef struct { + VFIOUserHdr hdr; + uint32_t argsz; + uint32_t flags; + uint32_t num_regions; + uint32_t num_irqs; +} VFIOUserDeviceInfo; + #endif /* VFIO_USER_PROTOCOL_H */ diff --git a/hw/vfio/user.h b/hw/vfio/user.h index 038e5e3..d148661 100644 --- a/hw/vfio/user.h +++ b/hw/vfio/user.h @@ -90,6 +90,7 @@ void vfio_user_disconnect(VFIOUserProxy *proxy); void vfio_user_set_handler(VFIODevice *vbasedev, void (*handler)(void *opaque, VFIOUserMsg *msg), void *reqarg); +int vfio_user_get_device(VFIODevice *vbasedev, Error **errp); int vfio_user_validate_version(VFIOUserProxy *proxy, Error **errp); =20 #endif /* VFIO_USER_H */ diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index e1ee0ac..0962e37 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -254,6 +254,9 @@ void vfio_put_group(VFIOGroup *group); int vfio_get_device(VFIOGroup *group, const char *name, VFIODevice *vbasedev, Error **errp); =20 +void vfio_init_device(VFIODevice *vbasedev, VFIOGroup *group, + struct vfio_device_info *info); + extern const MemoryRegionOps vfio_region_ops; typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList; extern VFIOGroupList vfio_group_list; diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 45b950a..792e247 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -2369,6 +2369,20 @@ void vfio_get_all_regions(VFIODevice *vbasedev) } } =20 +void vfio_init_device(VFIODevice *vbasedev, VFIOGroup *group, + struct vfio_device_info *info) +{ + vbasedev->group =3D group; + QLIST_INSERT_HEAD(&group->device_list, vbasedev, next); + + vbasedev->num_irqs =3D info->num_irqs; + vbasedev->num_regions =3D info->num_regions; + vbasedev->flags =3D info->flags; + vbasedev->reset_works =3D !!(info->flags & VFIO_DEVICE_FLAGS_RESET); + + vfio_get_all_regions(vbasedev); +} + int vfio_get_device(VFIOGroup *group, const char *name, VFIODevice *vbasedev, Error **errp) { @@ -2414,18 +2428,11 @@ int vfio_get_device(VFIOGroup *group, const char *n= ame, } =20 vbasedev->fd =3D fd; - vbasedev->group =3D group; - QLIST_INSERT_HEAD(&group->device_list, vbasedev, next); - - vbasedev->num_irqs =3D dev_info.num_irqs; - vbasedev->num_regions =3D dev_info.num_regions; - vbasedev->flags =3D dev_info.flags; + vfio_init_device(vbasedev, group, &dev_info); =20 trace_vfio_get_device(name, dev_info.flags, dev_info.num_regions, dev_info.num_irqs); =20 - vfio_get_all_regions(vbasedev); - vbasedev->reset_works =3D !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET); return 0; } =20 diff --git a/hw/vfio/user-pci.c b/hw/vfio/user-pci.c index 0fe2593..e5a9450 100644 --- a/hw/vfio/user-pci.c +++ b/hw/vfio/user-pci.c @@ -77,6 +77,7 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Error = **errp) VFIODevice *vbasedev =3D &vdev->vbasedev; SocketAddress addr; VFIOUserProxy *proxy; + int ret; Error *err =3D NULL; =20 /* @@ -116,6 +117,11 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Err= or **errp) vbasedev->type =3D VFIO_DEVICE_TYPE_PCI; vbasedev->dev =3D DEVICE(vdev); =20 + ret =3D vfio_user_get_device(vbasedev, errp); + if (ret) { + goto error; + } + return; =20 error: diff --git a/hw/vfio/user.c b/hw/vfio/user.c index 2d60f99..d0ec14c 100644 --- a/hw/vfio/user.c +++ b/hw/vfio/user.c @@ -32,6 +32,14 @@ #include "user.h" #include "trace.h" =20 + +/* + * These are to defend against a malign server trying + * to force us to run out of memory. + */ +#define VFIO_USER_MAX_REGIONS 100 +#define VFIO_USER_MAX_IRQS 50 + static int wait_time =3D 5000; /* wait up to 5 sec for busy servers */ static IOThread *vfio_user_iothread; =20 @@ -55,6 +63,9 @@ static void vfio_user_send_wait(VFIOUserProxy *proxy, VFI= OUserHdr *hdr, static void vfio_user_request_msg(VFIOUserHdr *hdr, uint16_t cmd, uint32_t size, uint32_t flags); =20 +static int vfio_user_get_info(VFIOUserProxy *proxy, + struct vfio_device_info *info); + static inline void vfio_user_set_error(VFIOUserHdr *hdr, uint32_t err) { hdr->flags |=3D VFIO_USER_ERROR; @@ -807,6 +818,30 @@ void vfio_user_disconnect(VFIOUserProxy *proxy) g_free(proxy); } =20 +int vfio_user_get_device(VFIODevice *vbasedev, Error **errp) +{ + struct vfio_device_info info =3D { .argsz =3D sizeof(info) }; + int ret; + + ret =3D vfio_user_get_info(vbasedev->proxy, &info); + if (ret) { + error_setg_errno(errp, -ret, "get info failure"); + return ret; + } + + /* defend against a malicious server */ + if (info.num_regions > VFIO_USER_MAX_REGIONS || + info.num_irqs > VFIO_USER_MAX_IRQS) { + error_printf("vfio_user_get_info: invalid reply\n"); + return -EINVAL; + } + + vbasedev->fd =3D -1; + vfio_init_device(vbasedev, NULL, &info); + + return 0; +} + static void vfio_user_request_msg(VFIOUserHdr *hdr, uint16_t cmd, uint32_t size, uint32_t flags) { @@ -1088,3 +1123,23 @@ int vfio_user_validate_version(VFIOUserProxy *proxy,= Error **errp) trace_vfio_user_version(msgp->major, msgp->minor, msgp->capabilities); return 0; } + +static int vfio_user_get_info(VFIOUserProxy *proxy, + struct vfio_device_info *info) +{ + VFIOUserDeviceInfo msg; + uint32_t argsz =3D sizeof(msg) - sizeof(msg.hdr); + + memset(&msg, 0, sizeof(msg)); + vfio_user_request_msg(&msg.hdr, VFIO_USER_DEVICE_GET_INFO, sizeof(msg)= , 0); + msg.argsz =3D argsz; + + vfio_user_send_wait(proxy, &msg.hdr, NULL, 0, false); + if (msg.hdr.flags & VFIO_USER_ERROR) { + return -msg.hdr.error_reply; + } + trace_vfio_user_get_info(msg.num_regions, msg.num_irqs); + + memcpy(info, &msg.argsz, argsz); + return 0; +} diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index e3640bc..ff903c0 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -173,3 +173,4 @@ vfio_user_recv_read(uint16_t id, int read) " id 0x%x re= ad 0x%x" vfio_user_recv_request(uint16_t cmd) " command 0x%x" vfio_user_send_write(uint16_t id, int wrote) " id 0x%x wrote 0x%x" vfio_user_version(uint16_t major, uint16_t minor, const char *caps) " majo= r %d minor %d caps: %s" +vfio_user_get_info(uint32_t nregions, uint32_t nirqs) " #regions %d #irqs = %d" --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675316834; cv=none; d=zohomail.com; s=zohoarc; b=BQRgziusYadu25dteioJTyDx3Y6UAgOWMwLwI4itiawiWkYqdJ06S6bMCUSrEcNEl3QvLy+V5W2BDS10NyES7axSHAy5eJ+uTsTvTCCMrIE5R3tRRabQ9+ftAlZtgwsxkn77Hv8oEvOYGy2jnY55Mlwmw/GvCE6xHm8Uxtaq3vI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675316834; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=s1MDzqiEhZvEGj20HMCmrvGS6OT/WBSFA+G7cai9Qos=; b=AmMh+x/HYliAJnxWYzWmjRbicOUtP/BziTcVsr/vykOQ0OXzwNtHCX4IyqgjeZf5G6XwOvDlASL9BG85i/ZTcp0oOs1+1jcMw5gk8vM/qlMGO/yby8gPUGTjjuvBRb+E6mPksv1elhq6SbglogyUhWOP/fAxCy7Z5ggKQ3ID06U= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTP id 1675316834046678.8633464234366; Wed, 1 Feb 2023 21:47:14 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSPJ-00089Y-3f; Thu, 02 Feb 2023 00:45:13 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPE-00088D-So for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:08 -0500 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPC-00079b-Op for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:08 -0500 Received: from pps.filterd (m0333520.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124jb3p014485; Thu, 2 Feb 2023 05:45:05 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfq4hj0cf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:45:05 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254Ppg013015; Thu, 2 Feb 2023 05:45:04 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-12 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:45:04 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=s1MDzqiEhZvEGj20HMCmrvGS6OT/WBSFA+G7cai9Qos=; b=aqjdW39hgU9Ou70SSC3FG1I6UUwX68KP5Cj7j7feOujFtk1VmobM7kTTJDPSa+RWMfuo gIFR2CFhAlQiM/HNO1FKg6gdPXxlk4VPiLAu8S/xw0RlSnANdlJnfyuGJCObZELoUPVr u+jMYRSYRA0YYkszZgbiEDj+MOKffklujoEJo6f4PTuWJdknFAb/XN+MumuTIemyP2e8 rMHqI6EzGkrz+TGv71Vra7Uw04Y/V7UrHuqrScapC0RRUkCVVRYaVZAfCATTma90eNh/ /jrZD2r3L1vn9zRACdA8TavKTaap0GCoM/IlIsjkMmw19J2bA3lvHvcI+j8incxmzj90 bw== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 11/23] vfio-user: get region info Date: Wed, 1 Feb 2023 21:55:47 -0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-GUID: 1U52E3Atx7ugWV9PD93DWAwCTheSoEOx X-Proofpoint-ORIG-GUID: 1U52E3Atx7ugWV9PD93DWAwCTheSoEOx Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.177.32; envelope-from=john.g.johnson@oracle.com; helo=mx0b-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675316835757100003 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Add per-region FD to support mmap() of remote device regions Signed-off-by: Elena Ufimtseva Signed-off-by: John G Johnson Signed-off-by: Jagannathan Raman --- hw/vfio/user-protocol.h | 14 +++++++++ hw/vfio/user.h | 2 ++ include/hw/vfio/vfio-common.h | 5 +++- hw/vfio/ap.c | 1 + hw/vfio/ccw.c | 1 + hw/vfio/common.c | 31 ++++++++++++++++++-- hw/vfio/pci.c | 1 + hw/vfio/platform.c | 1 + hw/vfio/user-pci.c | 2 ++ hw/vfio/user.c | 68 +++++++++++++++++++++++++++++++++++++++= ++++ hw/vfio/trace-events | 1 + 11 files changed, 123 insertions(+), 4 deletions(-) diff --git a/hw/vfio/user-protocol.h b/hw/vfio/user-protocol.h index 5f9ef17..6f70a48 100644 --- a/hw/vfio/user-protocol.h +++ b/hw/vfio/user-protocol.h @@ -125,4 +125,18 @@ typedef struct { uint32_t num_irqs; } VFIOUserDeviceInfo; =20 +/* + * VFIO_USER_DEVICE_GET_REGION_INFO + * imported from struct vfio_region_info + */ +typedef struct { + VFIOUserHdr hdr; + uint32_t argsz; + uint32_t flags; + uint32_t index; + uint32_t cap_offset; + uint64_t size; + uint64_t offset; +} VFIOUserRegionInfo; + #endif /* VFIO_USER_PROTOCOL_H */ diff --git a/hw/vfio/user.h b/hw/vfio/user.h index d148661..e6485dc 100644 --- a/hw/vfio/user.h +++ b/hw/vfio/user.h @@ -93,4 +93,6 @@ void vfio_user_set_handler(VFIODevice *vbasedev, int vfio_user_get_device(VFIODevice *vbasedev, Error **errp); int vfio_user_validate_version(VFIOUserProxy *proxy, Error **errp); =20 +extern VFIODeviceIO vfio_dev_io_sock; + #endif /* VFIO_USER_H */ diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 0962e37..9fb4c80 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -56,6 +56,7 @@ typedef struct VFIORegion { uint32_t nr_mmaps; VFIOMmap *mmaps; uint8_t nr; /* cache the region number for debug */ + int fd; /* fd to mmap() region */ } VFIORegion; =20 typedef struct VFIOMigration { @@ -140,6 +141,7 @@ typedef struct VFIODevice { bool no_mmap; bool ram_block_discard_allowed; bool enable_migration; + bool use_regfds; VFIODeviceOps *ops; VFIODeviceIO *io; unsigned int num_irqs; @@ -150,6 +152,7 @@ typedef struct VFIODevice { OnOffAuto pre_copy_dirty_page_tracking; VFIOUserProxy *proxy; struct vfio_region_info **regions; + int *regfds; } VFIODevice; =20 struct VFIODeviceOps { @@ -171,7 +174,7 @@ struct VFIODeviceOps { */ struct VFIODeviceIO { int (*get_region_info)(VFIODevice *vdev, - struct vfio_region_info *info); + struct vfio_region_info *info, int *fd); int (*get_irq_info)(VFIODevice *vdev, struct vfio_irq_info *irq); int (*set_irqs)(VFIODevice *vdev, struct vfio_irq_set *irqs); int (*region_read)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t s= ize, diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c index c6638d5..06d745f 100644 --- a/hw/vfio/ap.c +++ b/hw/vfio/ap.c @@ -103,6 +103,7 @@ static void vfio_ap_realize(DeviceState *dev, Error **e= rrp) vapdev->vdev.name =3D g_strdup_printf("%s", mdevid); vapdev->vdev.dev =3D dev; vapdev->vdev.io =3D &vfio_dev_io_ioctl; + vapdev->vdev.use_regfds =3D false; =20 /* * vfio-ap devices operate in a way compatible with discarding of diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c index e4d840d..00605bd 100644 --- a/hw/vfio/ccw.c +++ b/hw/vfio/ccw.c @@ -615,6 +615,7 @@ static void vfio_ccw_get_device(VFIOGroup *group, VFIOC= CWDevice *vcdev, vcdev->vdev.name =3D name; vcdev->vdev.dev =3D &vcdev->cdev.parent_obj.parent_obj; vcdev->vdev.io =3D &vfio_dev_io_ioctl; + vcdev->vdev.use_regfds =3D false; =20 return; =20 diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 792e247..d26b325 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -1584,6 +1584,11 @@ int vfio_region_setup(Object *obj, VFIODevice *vbase= dev, VFIORegion *region, region->size =3D info->size; region->fd_offset =3D info->offset; region->nr =3D index; + if (vbasedev->regfds !=3D NULL) { + region->fd =3D vbasedev->regfds[index]; + } else { + region->fd =3D vbasedev->fd; + } =20 if (region->size) { region->mem =3D g_new0(MemoryRegion, 1); @@ -1635,7 +1640,7 @@ int vfio_region_mmap(VFIORegion *region) =20 for (i =3D 0; i < region->nr_mmaps; i++) { region->mmaps[i].mmap =3D mmap(NULL, region->mmaps[i].size, prot, - MAP_SHARED, region->vbasedev->fd, + MAP_SHARED, region->fd, region->fd_offset + region->mmaps[i].offset); if (region->mmaps[i].mmap =3D=3D MAP_FAILED) { @@ -2442,10 +2447,17 @@ void vfio_put_base_device(VFIODevice *vbasedev) int i; =20 for (i =3D 0; i < vbasedev->num_regions; i++) { + if (vbasedev->regfds !=3D NULL && vbasedev->regfds[i] !=3D -1)= { + close(vbasedev->regfds[i]); + } g_free(vbasedev->regions[i]); } g_free(vbasedev->regions); vbasedev->regions =3D NULL; + if (vbasedev->regfds !=3D NULL) { + g_free(vbasedev->regfds); + vbasedev->regfds =3D NULL; + } } =20 if (!vbasedev->group) { @@ -2461,12 +2473,16 @@ int vfio_get_region_info(VFIODevice *vbasedev, int = index, struct vfio_region_info **info) { size_t argsz =3D sizeof(struct vfio_region_info); + int fd =3D -1; int ret; =20 /* create region cache */ if (vbasedev->regions =3D=3D NULL) { vbasedev->regions =3D g_new0(struct vfio_region_info *, vbasedev->num_regions); + if (vbasedev->use_regfds) { + vbasedev->regfds =3D g_new0(int, vbasedev->num_regions); + } } /* check cache */ if (vbasedev->regions[index] !=3D NULL) { @@ -2480,7 +2496,7 @@ int vfio_get_region_info(VFIODevice *vbasedev, int in= dex, retry: (*info)->argsz =3D argsz; =20 - ret =3D vbasedev->io->get_region_info(vbasedev, *info); + ret =3D vbasedev->io->get_region_info(vbasedev, *info, &fd); if (ret !=3D 0) { g_free(*info); *info =3D NULL; @@ -2490,12 +2506,19 @@ retry: if ((*info)->argsz > argsz) { argsz =3D (*info)->argsz; *info =3D g_realloc(*info, argsz); + if (fd !=3D -1) { + close(fd); + fd =3D -1; + } =20 goto retry; } =20 /* fill cache */ vbasedev->regions[index] =3D *info; + if (vbasedev->regfds !=3D NULL) { + vbasedev->regfds[index] =3D fd; + } =20 return 0; } @@ -2646,10 +2669,12 @@ int vfio_eeh_as_op(AddressSpace *as, uint32_t op) */ =20 static int vfio_io_get_region_info(VFIODevice *vbasedev, - struct vfio_region_info *info) + struct vfio_region_info *info, + int *fd) { int ret; =20 + *fd =3D -1; ret =3D ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, info); =20 return ret < 0 ? -errno : ret; diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index a8bc0ea..935d247 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2908,6 +2908,7 @@ static void vfio_realize(PCIDevice *pdev, Error **err= p) vbasedev->type =3D VFIO_DEVICE_TYPE_PCI; vbasedev->dev =3D DEVICE(vdev); vbasedev->io =3D &vfio_dev_io_ioctl; + vbasedev->use_regfds =3D false; =20 tmp =3D g_strdup_printf("%s/iommu_group", vbasedev->sysfsdev); len =3D readlink(tmp, group_path, sizeof(group_path)); diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c index 222405e..8ddfcca 100644 --- a/hw/vfio/platform.c +++ b/hw/vfio/platform.c @@ -622,6 +622,7 @@ static void vfio_platform_realize(DeviceState *dev, Err= or **errp) vbasedev->dev =3D dev; vbasedev->ops =3D &vfio_platform_ops; vbasedev->io =3D &vfio_dev_io_ioctl; + vbasedev->use_regfds =3D false; =20 qemu_mutex_init(&vdev->intp_mutex); =20 diff --git a/hw/vfio/user-pci.c b/hw/vfio/user-pci.c index e5a9450..09c6c98 100644 --- a/hw/vfio/user-pci.c +++ b/hw/vfio/user-pci.c @@ -116,6 +116,8 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Erro= r **errp) vbasedev->ops =3D &vfio_user_pci_ops; vbasedev->type =3D VFIO_DEVICE_TYPE_PCI; vbasedev->dev =3D DEVICE(vdev); + vbasedev->io =3D &vfio_dev_io_sock; + vbasedev->use_regfds =3D true; =20 ret =3D vfio_user_get_device(vbasedev, errp); if (ret) { diff --git a/hw/vfio/user.c b/hw/vfio/user.c index d0ec14c..a05ba80 100644 --- a/hw/vfio/user.c +++ b/hw/vfio/user.c @@ -1143,3 +1143,71 @@ static int vfio_user_get_info(VFIOUserProxy *proxy, memcpy(info, &msg.argsz, argsz); return 0; } + +static int vfio_user_get_region_info(VFIOUserProxy *proxy, + struct vfio_region_info *info, + VFIOUserFDs *fds) +{ + g_autofree VFIOUserRegionInfo *msgp =3D NULL; + uint32_t size; + + /* data returned can be larger than vfio_region_info */ + if (info->argsz < sizeof(*info)) { + error_printf("vfio_user_get_region_info argsz too small\n"); + return -E2BIG; + } + if (fds !=3D NULL && fds->send_fds !=3D 0) { + error_printf("vfio_user_get_region_info can't send FDs\n"); + return -EINVAL; + } + + size =3D info->argsz + sizeof(VFIOUserHdr); + msgp =3D g_malloc0(size); + + vfio_user_request_msg(&msgp->hdr, VFIO_USER_DEVICE_GET_REGION_INFO, + sizeof(*msgp), 0); + msgp->argsz =3D info->argsz; + msgp->index =3D info->index; + + vfio_user_send_wait(proxy, &msgp->hdr, fds, size, false); + if (msgp->hdr.flags & VFIO_USER_ERROR) { + return -msgp->hdr.error_reply; + } + trace_vfio_user_get_region_info(msgp->index, msgp->flags, msgp->size); + + memcpy(info, &msgp->argsz, info->argsz); + return 0; +} + + +/* + * Socket-based io_ops + */ + +static int vfio_user_io_get_region_info(VFIODevice *vbasedev, + struct vfio_region_info *info, + int *fd) +{ + int ret; + VFIOUserFDs fds =3D { 0, 1, fd}; + + ret =3D vfio_user_get_region_info(vbasedev->proxy, info, &fds); + if (ret) { + return ret; + } + + if (info->index > vbasedev->num_regions) { + return -EINVAL; + } + /* cap_offset in valid area */ + if ((info->flags & VFIO_REGION_INFO_FLAG_CAPS) && + (info->cap_offset < sizeof(*info) || info->cap_offset > info->args= z)) { + return -EINVAL; + } + + return 0; +} + +VFIODeviceIO vfio_dev_io_sock =3D { + .get_region_info =3D vfio_user_io_get_region_info, +}; diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index ff903c0..939113a 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -174,3 +174,4 @@ vfio_user_recv_request(uint16_t cmd) " command 0x%x" vfio_user_send_write(uint16_t id, int wrote) " id 0x%x wrote 0x%x" vfio_user_version(uint16_t major, uint16_t minor, const char *caps) " majo= r %d minor %d caps: %s" vfio_user_get_info(uint32_t nregions, uint32_t nirqs) " #regions %d #irqs = %d" +vfio_user_get_region_info(uint32_t index, uint32_t flags, uint64_t size) "= index %d flags 0x%x size 0x%"PRIx64 --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675316920; cv=none; d=zohomail.com; s=zohoarc; b=BgEvr4D8+Ku3pYZailBjxijsnVR+o9JXkNuCTKCu3a7PiEAhZgaUYu7FsjPcZV+iw80oGxY4m1WN0Tci8zySGiLUnI/RZbgMWi5fEpypm+++N7EcCvxpvOHk+L1isyTL5j7EOD6mYYm5LLU6RPvfGaNwn4Ar+5M4LnghO7mVkiM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675316920; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=sXu9rQ4QM4Hyd2a0G2saDdjicLxaHKNzBDPiX8YOP1E=; b=f/21gOhf+VEWxJJgc38y/akTns27D5GaPbSyPLphcrFPs6NrbFy7qaI0taSxY7sB/sBfJGRIPH7NZyHircwVC5opbcLXirmpsZ7TI2ENBgSb+lt9SsUCzTCWEgu+HLuKt5S5NkjGMw5hkZvnSUMF3Zxt2HhC6xnrgh8/+uzf2Ag= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1675316920388434.5312339092901; Wed, 1 Feb 2023 21:48:40 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSPM-0008C0-8i; Thu, 02 Feb 2023 00:45:16 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPI-00089S-5h for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:12 -0500 Received: from mx0a-00069f02.pphosted.com ([205.220.165.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPF-0007AD-CS for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:11 -0500 Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124ipUv027511; Thu, 2 Feb 2023 05:45:07 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfpywj0w1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:45:07 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254Pph013015; Thu, 2 Feb 2023 05:45:05 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-13 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:45:05 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=sXu9rQ4QM4Hyd2a0G2saDdjicLxaHKNzBDPiX8YOP1E=; b=zoK6vaAK9mJh4bBTin8Zdpa/kWPmqRh9AZXgUT/82rLNH6xYEaWBrLQuOCHeNeJ7nDG1 yMbEOho4neomrIKrnZypx/4jrko+eXTgx6RDiDTc3m9bqZgpMm0tJrRSMc8jXbwqA1HJ Lrk7soJb9iljiZtmNpcJV18sR0jsJc9XHWAsBH+JiGfc2yE9MNZTr2lrKPtQ50FIZN61 yPTWYGvBbIwWzYu9houhMXXAoIavrXkpPy58PtDrLJDzh3jrzTLgax8BLhNwSwcxZnLO 3o8NJ9ZvU69iA+DJKPvpLkX3nIp74x/0KOJOTE2YRj5zDQsIo5CxkjMqseu8RUsuycMJ 8A== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 12/23] vfio-user: region read/write Date: Wed, 1 Feb 2023 21:55:48 -0800 Message-Id: <83ec17255d41c90eb3950364dd853b240398705b.1675228037.git.john.g.johnson@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-GUID: BVXPbVGIhfp4g0ghd0_hIboO8xGG9fUg X-Proofpoint-ORIG-GUID: BVXPbVGIhfp4g0ghd0_hIboO8xGG9fUg Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.165.32; envelope-from=john.g.johnson@oracle.com; helo=mx0a-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675316921991100001 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Add support for posted writes on remote devices Signed-off-by: Elena Ufimtseva Signed-off-by: John G Johnson Signed-off-by: Jagannathan Raman --- hw/vfio/user-protocol.h | 12 +++++ hw/vfio/user.h | 1 + include/hw/vfio/vfio-common.h | 3 +- hw/vfio/common.c | 23 ++++++--- hw/vfio/pci.c | 5 +- hw/vfio/user-pci.c | 5 ++ hw/vfio/user.c | 112 ++++++++++++++++++++++++++++++++++++++= ++++ hw/vfio/trace-events | 1 + 8 files changed, 154 insertions(+), 8 deletions(-) diff --git a/hw/vfio/user-protocol.h b/hw/vfio/user-protocol.h index 6f70a48..6987435 100644 --- a/hw/vfio/user-protocol.h +++ b/hw/vfio/user-protocol.h @@ -139,4 +139,16 @@ typedef struct { uint64_t offset; } VFIOUserRegionInfo; =20 +/* + * VFIO_USER_REGION_READ + * VFIO_USER_REGION_WRITE + */ +typedef struct { + VFIOUserHdr hdr; + uint64_t offset; + uint32_t region; + uint32_t count; + char data[]; +} VFIOUserRegionRW; + #endif /* VFIO_USER_PROTOCOL_H */ diff --git a/hw/vfio/user.h b/hw/vfio/user.h index e6485dc..3012a86 100644 --- a/hw/vfio/user.h +++ b/hw/vfio/user.h @@ -84,6 +84,7 @@ typedef struct VFIOUserProxy { /* VFIOProxy flags */ #define VFIO_PROXY_CLIENT 0x1 #define VFIO_PROXY_FORCE_QUEUED 0x4 +#define VFIO_PROXY_NO_POST 0x8 =20 VFIOUserProxy *vfio_user_connect_dev(SocketAddress *addr, Error **errp); void vfio_user_disconnect(VFIOUserProxy *proxy); diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 9fb4c80..bbc4b15 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -57,6 +57,7 @@ typedef struct VFIORegion { VFIOMmap *mmaps; uint8_t nr; /* cache the region number for debug */ int fd; /* fd to mmap() region */ + bool post_wr; /* writes can be posted */ } VFIORegion; =20 typedef struct VFIOMigration { @@ -180,7 +181,7 @@ struct VFIODeviceIO { int (*region_read)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t s= ize, void *data); int (*region_write)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t = size, - void *data); + void *data, bool post); }; =20 struct VFIOContainerIO { diff --git a/hw/vfio/common.c b/hw/vfio/common.c index d26b325..de64e53 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -215,6 +215,7 @@ void vfio_region_write(void *opaque, hwaddr addr, uint32_t dword; uint64_t qword; } buf; + bool post =3D region->post_wr; int ret; =20 switch (size) { @@ -235,12 +236,19 @@ void vfio_region_write(void *opaque, hwaddr addr, break; } =20 - ret =3D vbasedev->io->region_write(vbasedev, region->nr, addr, size, &= buf); + /* read-after-write hazard if guest can directly access region */ + if (region->nr_mmaps) { + post =3D false; + } + ret =3D vbasedev->io->region_write(vbasedev, region->nr, addr, size, &= buf, + post); if (ret !=3D size) { + const char *err =3D ret < 0 ? strerror(-ret) : "short write"; + error_report("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64 - ",%d) failed: %m", + ",%d) failed: %s", __func__, vbasedev->name, region->nr, - addr, data, size); + addr, data, size, err); } trace_vfio_region_write(vbasedev->name, region->nr, addr, data, size); =20 @@ -271,9 +279,11 @@ uint64_t vfio_region_read(void *opaque, =20 ret =3D vbasedev->io->region_read(vbasedev, region->nr, addr, size, &b= uf); if (ret !=3D size) { - error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %m", + const char *err =3D ret < 0 ? strerror(-ret) : "short read"; + + error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %s", __func__, vbasedev->name, region->nr, - addr, size); + addr, size, err); return (uint64_t)-1; } =20 @@ -1584,6 +1594,7 @@ int vfio_region_setup(Object *obj, VFIODevice *vbased= ev, VFIORegion *region, region->size =3D info->size; region->fd_offset =3D info->offset; region->nr =3D index; + region->post_wr =3D false; if (vbasedev->regfds !=3D NULL) { region->fd =3D vbasedev->regfds[index]; } else { @@ -2711,7 +2722,7 @@ static int vfio_io_region_read(VFIODevice *vbasedev, = uint8_t index, off_t off, } =20 static int vfio_io_region_write(VFIODevice *vbasedev, uint8_t index, off_t= off, - uint32_t size, void *data) + uint32_t size, void *data, bool post) { struct vfio_region_info *info =3D vbasedev->regions[index]; int ret; diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 935d247..be714b7 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -49,7 +49,7 @@ (off), (size), (data))) #define VDEV_CONFIG_WRITE(vbasedev, off, size, data) \ ((vbasedev)->io->region_write((vbasedev), VFIO_PCI_CONFIG_REGION_INDEX= , \ - (off), (size), (data))) + (off), (size), (data), false)) =20 #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug" =20 @@ -1702,6 +1702,9 @@ static void vfio_bar_prepare(VFIOPCIDevice *vdev, int= nr) bar->type =3D pci_bar & (bar->ioport ? ~PCI_BASE_ADDRESS_IO_MASK : ~PCI_BASE_ADDRESS_MEM_MASK); bar->size =3D bar->region.size; + + /* IO regions are sync, memory can be async */ + bar->region.post_wr =3D (bar->ioport =3D=3D 0); } =20 static void vfio_bars_prepare(VFIOPCIDevice *vdev) diff --git a/hw/vfio/user-pci.c b/hw/vfio/user-pci.c index 09c6c98..900ab5f 100644 --- a/hw/vfio/user-pci.c +++ b/hw/vfio/user-pci.c @@ -41,6 +41,7 @@ struct VFIOUserPCIDevice { VFIOPCIDevice device; char *sock_name; bool send_queued; /* all sends are queued */ + bool no_post; /* all regions write are sync */ }; =20 /* @@ -105,6 +106,9 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Erro= r **errp) if (udev->send_queued) { proxy->flags |=3D VFIO_PROXY_FORCE_QUEUED; } + if (udev->no_post) { + proxy->flags |=3D VFIO_PROXY_NO_POST; + } =20 vfio_user_validate_version(proxy, &err); if (err !=3D NULL) { @@ -145,6 +149,7 @@ static void vfio_user_instance_finalize(Object *obj) static Property vfio_user_pci_dev_properties[] =3D { DEFINE_PROP_STRING("socket", VFIOUserPCIDevice, sock_name), DEFINE_PROP_BOOL("x-send-queued", VFIOUserPCIDevice, send_queued, fals= e), + DEFINE_PROP_BOOL("x-no-posted-writes", VFIOUserPCIDevice, no_post, fal= se), DEFINE_PROP_END_OF_LIST(), }; =20 diff --git a/hw/vfio/user.c b/hw/vfio/user.c index a05ba80..389c807 100644 --- a/hw/vfio/user.c +++ b/hw/vfio/user.c @@ -58,6 +58,8 @@ static void vfio_user_cb(void *opaque); =20 static void vfio_user_request(void *opaque); static int vfio_user_send_queued(VFIOUserProxy *proxy, VFIOUserMsg *msg); +static void vfio_user_send_async(VFIOUserProxy *proxy, VFIOUserHdr *hdr, + VFIOUserFDs *fds); static void vfio_user_send_wait(VFIOUserProxy *proxy, VFIOUserHdr *hdr, VFIOUserFDs *fds, int rsize, bool nobql); static void vfio_user_request_msg(VFIOUserHdr *hdr, uint16_t cmd, @@ -631,6 +633,33 @@ static int vfio_user_send_queued(VFIOUserProxy *proxy,= VFIOUserMsg *msg) return 0; } =20 +/* + * async send - msg can be queued, but will be freed when sent + */ +static void vfio_user_send_async(VFIOUserProxy *proxy, VFIOUserHdr *hdr, + VFIOUserFDs *fds) +{ + VFIOUserMsg *msg; + int ret; + + if (!(hdr->flags & (VFIO_USER_NO_REPLY | VFIO_USER_REPLY))) { + error_printf("vfio_user_send_async on sync message\n"); + return; + } + + QEMU_LOCK_GUARD(&proxy->lock); + + msg =3D vfio_user_getmsg(proxy, hdr, fds); + msg->id =3D hdr->id; + msg->rsize =3D 0; + msg->type =3D VFIO_MSG_ASYNC; + + ret =3D vfio_user_send_queued(proxy, msg); + if (ret < 0) { + vfio_user_recycle(proxy, msg); + } +} + static void vfio_user_send_wait(VFIOUserProxy *proxy, VFIOUserHdr *hdr, VFIOUserFDs *fds, int rsize, bool nobql) { @@ -1179,6 +1208,73 @@ static int vfio_user_get_region_info(VFIOUserProxy *= proxy, return 0; } =20 +static int vfio_user_region_read(VFIOUserProxy *proxy, uint8_t index, + off_t offset, uint32_t count, void *data) +{ + g_autofree VFIOUserRegionRW *msgp =3D NULL; + int size =3D sizeof(*msgp) + count; + + if (count > proxy->max_xfer_size) { + return -EINVAL; + } + + msgp =3D g_malloc0(size); + vfio_user_request_msg(&msgp->hdr, VFIO_USER_REGION_READ, sizeof(*msgp)= , 0); + msgp->offset =3D offset; + msgp->region =3D index; + msgp->count =3D count; + trace_vfio_user_region_rw(msgp->region, msgp->offset, msgp->count); + + vfio_user_send_wait(proxy, &msgp->hdr, NULL, size, false); + if (msgp->hdr.flags & VFIO_USER_ERROR) { + return -msgp->hdr.error_reply; + } else if (msgp->count > count) { + return -E2BIG; + } else { + memcpy(data, &msgp->data, msgp->count); + } + + return msgp->count; +} + +static int vfio_user_region_write(VFIOUserProxy *proxy, uint8_t index, + off_t offset, uint32_t count, void *data, + bool post) +{ + VFIOUserRegionRW *msgp =3D NULL; + int flags =3D post ? VFIO_USER_NO_REPLY : 0; + int size =3D sizeof(*msgp) + count; + int ret; + + if (count > proxy->max_xfer_size) { + return -EINVAL; + } + + msgp =3D g_malloc0(size); + vfio_user_request_msg(&msgp->hdr, VFIO_USER_REGION_WRITE, size, flags); + msgp->offset =3D offset; + msgp->region =3D index; + msgp->count =3D count; + memcpy(&msgp->data, data, count); + trace_vfio_user_region_rw(msgp->region, msgp->offset, msgp->count); + + /* async send will free msg after it's sent */ + if (post && !(proxy->flags & VFIO_PROXY_NO_POST)) { + vfio_user_send_async(proxy, &msgp->hdr, NULL); + return count; + } + + vfio_user_send_wait(proxy, &msgp->hdr, NULL, 0, false); + if (msgp->hdr.flags & VFIO_USER_ERROR) { + ret =3D -msgp->hdr.error_reply; + } else { + ret =3D count; + } + + g_free(msgp); + return ret; +} + =20 /* * Socket-based io_ops @@ -1208,6 +1304,22 @@ static int vfio_user_io_get_region_info(VFIODevice *= vbasedev, return 0; } =20 +static int vfio_user_io_region_read(VFIODevice *vbasedev, uint8_t index, + off_t off, uint32_t size, void *data) +{ + return vfio_user_region_read(vbasedev->proxy, index, off, size, data); +} + +static int vfio_user_io_region_write(VFIODevice *vbasedev, uint8_t index, + off_t off, unsigned size, void *data, + bool post) +{ + return vfio_user_region_write(vbasedev->proxy, index, off, size, data, + post); +} + VFIODeviceIO vfio_dev_io_sock =3D { .get_region_info =3D vfio_user_io_get_region_info, + .region_read =3D vfio_user_io_region_read, + .region_write =3D vfio_user_io_region_write, }; diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index 939113a..1f3688f 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -175,3 +175,4 @@ vfio_user_send_write(uint16_t id, int wrote) " id 0x%x = wrote 0x%x" vfio_user_version(uint16_t major, uint16_t minor, const char *caps) " majo= r %d minor %d caps: %s" vfio_user_get_info(uint32_t nregions, uint32_t nirqs) " #regions %d #irqs = %d" vfio_user_get_region_info(uint32_t index, uint32_t flags, uint64_t size) "= index %d flags 0x%x size 0x%"PRIx64 +vfio_user_region_rw(uint32_t region, uint64_t off, uint32_t count) " regio= n %d offset 0x%"PRIx64" count %d" --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675316812; cv=none; d=zohomail.com; s=zohoarc; b=TyDQkAUkgr3TBfSjO3Vd3RPIckmVETa+xqN52gwt+0HEDRTfP0MNwFF9nw82mPIuBQDuwxAnBQQ0+YdPn4VOzKSJqKN9EkBYacKyE++12ANDZJNHMd2j63cb4hhFx4a6wDTb1fGK972icZXDnpkoTkgfX0A1Wa9kXAj05YSQEkM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675316812; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=DcxAm8RLWeiMlaB1WxT4I+RpGMqP/Kazg75iqPuevVw=; b=Ase+/aB4dU4vV5+TxL+MCwDvsM7ipm8tiU95Dgf8aU66xxCSBJaKVdKmYIimHToRaLVVWB+ZWNb+N7eSB8m+aeHBG269kCIsEcFPSmdpev8KXQYEQ6Hw9+taF18INLP/e39kVGwMik0BorMkinzY0H3eU5vlfDVFlQqoXXPFd9k= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1675316812170533.1510670720482; Wed, 1 Feb 2023 21:46:52 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSPJ-00089k-Lc; Thu, 02 Feb 2023 00:45:13 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPG-000893-Sj for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:11 -0500 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPE-0007A6-It for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:10 -0500 Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124iTMW029513; Thu, 2 Feb 2023 05:45:07 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfmbg2bhf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:45:07 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254Ppk013015; Thu, 2 Feb 2023 05:45:06 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-14 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:45:06 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=DcxAm8RLWeiMlaB1WxT4I+RpGMqP/Kazg75iqPuevVw=; b=wtWPIYhBpQ5+fkk6f3SL2dRsec/paa+Gs/3mDoSIxg7jQnBllaPVvfJv2l7vaHD32r9g CCyt1YT3CPGXP7LdYXS/4efuHNPDetZv4yHaIjGgYr8nsTDtj0vmZv+vHoW9aBwl2Cup Sr4SFPhpcUr7fh31HN8a2aHKOf0FsInDfiDP57+pV063hJW6w6YSwJBvIEMY/b2fE9kc Zut9nUCtPbkcBUaXjv964OZQ/DCm2jfcn60tmsPQwWRK6m8LE30mBUI4rZZq8uP3fvpe F63I15iE9k4TaY0QV3fRPmx08wayTCJmYQlkPBsmMD9BN1tDQ/p7o4y5fsJCmD/r9GTN hw== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 13/23] vfio-user: pci_user_realize PCI setup Date: Wed, 1 Feb 2023 21:55:49 -0800 Message-Id: <863fc4e015e11ddf38e0c9272499a56498e6b65f.1675228037.git.john.g.johnson@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-ORIG-GUID: 6HVlZ6C-a2htPLmH9CQJWV9gx7uySB-L X-Proofpoint-GUID: 6HVlZ6C-a2htPLmH9CQJWV9gx7uySB-L Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.177.32; envelope-from=john.g.johnson@oracle.com; helo=mx0b-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675316813644100001 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" PCI BARs read from remote device PCI config reads/writes sent to remote server Signed-off-by: Elena Ufimtseva Signed-off-by: John G Johnson Signed-off-by: Jagannathan Raman --- hw/vfio/pci.h | 9 ++ hw/vfio/pci.c | 257 ++++++++++++++++++++++++++++++-------------------= ---- hw/vfio/user-pci.c | 47 ++++++++++ 3 files changed, 203 insertions(+), 110 deletions(-) diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 50a1d07..4f70664 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -212,7 +212,16 @@ void vfio_intx_eoi(VFIODevice *vbasedev); Object *vfio_pci_get_object(VFIODevice *vbasedev); void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f); int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f); +void vfio_populate_device(VFIOPCIDevice *vdev, Error **errp); +void vfio_teardown_msi(VFIOPCIDevice *vdev); +void vfio_bars_exit(VFIOPCIDevice *vdev); +void vfio_bars_finalize(VFIOPCIDevice *vdev); +int vfio_add_capabilities(VFIOPCIDevice *vdev, Error **errp); void vfio_put_device(VFIOPCIDevice *vdev); +void vfio_register_err_notifier(VFIOPCIDevice *vdev); +void vfio_register_req_notifier(VFIOPCIDevice *vdev); +void vfio_pci_config_setup(VFIOPCIDevice *vdev, Error **errp); +int vfio_interrupt_setup(VFIOPCIDevice *vdev, Error **errp); void vfio_instance_init(Object *obj); =20 uint64_t vfio_vga_read(void *opaque, hwaddr addr, unsigned size); diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index be714b7..b8b5c34 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -1651,7 +1651,7 @@ static int vfio_msix_setup(VFIOPCIDevice *vdev, int p= os, Error **errp) return 0; } =20 -static void vfio_teardown_msi(VFIOPCIDevice *vdev) +void vfio_teardown_msi(VFIOPCIDevice *vdev) { msi_uninit(&vdev->pdev); =20 @@ -1751,7 +1751,7 @@ static void vfio_bars_register(VFIOPCIDevice *vdev) } } =20 -static void vfio_bars_exit(VFIOPCIDevice *vdev) +void vfio_bars_exit(VFIOPCIDevice *vdev) { int i; =20 @@ -1771,7 +1771,7 @@ static void vfio_bars_exit(VFIOPCIDevice *vdev) } } =20 -static void vfio_bars_finalize(VFIOPCIDevice *vdev) +void vfio_bars_finalize(VFIOPCIDevice *vdev) { int i; =20 @@ -2185,7 +2185,7 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev) return; } =20 -static int vfio_add_capabilities(VFIOPCIDevice *vdev, Error **errp) +int vfio_add_capabilities(VFIOPCIDevice *vdev, Error **errp) { PCIDevice *pdev =3D &vdev->pdev; int ret; @@ -2641,7 +2641,7 @@ int vfio_populate_vga(VFIOPCIDevice *vdev, Error **er= rp) return 0; } =20 -static void vfio_populate_device(VFIOPCIDevice *vdev, Error **errp) +void vfio_populate_device(VFIOPCIDevice *vdev, Error **errp) { VFIODevice *vbasedev =3D &vdev->vbasedev; struct vfio_region_info *reg_info; @@ -2758,7 +2758,7 @@ static void vfio_err_notifier_handler(void *opaque) * and continue after disabling error recovery support for the * device. */ -static void vfio_register_err_notifier(VFIOPCIDevice *vdev) +void vfio_register_err_notifier(VFIOPCIDevice *vdev) { Error *err =3D NULL; int32_t fd; @@ -2817,7 +2817,7 @@ static void vfio_req_notifier_handler(void *opaque) } } =20 -static void vfio_register_req_notifier(VFIOPCIDevice *vdev) +void vfio_register_req_notifier(VFIOPCIDevice *vdev) { struct vfio_irq_info irq_info =3D { .argsz =3D sizeof(irq_info), .index =3D VFIO_PCI_REQ_IRQ_INDEX }; @@ -2872,6 +2872,133 @@ static void vfio_unregister_req_notifier(VFIOPCIDev= ice *vdev) vdev->req_enabled =3D false; } =20 +void vfio_pci_config_setup(VFIOPCIDevice *vdev, Error **errp) +{ + PCIDevice *pdev =3D &vdev->pdev; + VFIODevice *vbasedev =3D &vdev->vbasedev; + Error *err =3D NULL; + + /* vfio emulates a lot for us, but some bits need extra love */ + vdev->emulated_config_bits =3D g_malloc0(vdev->config_size); + + /* QEMU can choose to expose the ROM or not */ + memset(vdev->emulated_config_bits + PCI_ROM_ADDRESS, 0xff, 4); + /* QEMU can also add or extend BARs */ + memset(vdev->emulated_config_bits + PCI_BASE_ADDRESS_0, 0xff, 6 * 4); + + /* + * The PCI spec reserves vendor ID 0xffff as an invalid value. The + * device ID is managed by the vendor and need only be a 16-bit value. + * Allow any 16-bit value for subsystem so they can be hidden or chang= ed. + */ + if (vdev->vendor_id !=3D PCI_ANY_ID) { + if (vdev->vendor_id >=3D 0xffff) { + error_setg(errp, "invalid PCI vendor ID provided"); + return; + } + vfio_add_emulated_word(vdev, PCI_VENDOR_ID, vdev->vendor_id, ~0); + trace_vfio_pci_emulated_vendor_id(vdev->vbasedev.name, vdev->vendo= r_id); + } else { + vdev->vendor_id =3D pci_get_word(pdev->config + PCI_VENDOR_ID); + } + + if (vdev->device_id !=3D PCI_ANY_ID) { + if (vdev->device_id > 0xffff) { + error_setg(errp, "invalid PCI device ID provided"); + return; + } + vfio_add_emulated_word(vdev, PCI_DEVICE_ID, vdev->device_id, ~0); + trace_vfio_pci_emulated_device_id(vbasedev->name, vdev->device_id); + } else { + vdev->device_id =3D pci_get_word(pdev->config + PCI_DEVICE_ID); + } + + if (vdev->sub_vendor_id !=3D PCI_ANY_ID) { + if (vdev->sub_vendor_id > 0xffff) { + error_setg(errp, "invalid PCI subsystem vendor ID provided"); + return; + } + vfio_add_emulated_word(vdev, PCI_SUBSYSTEM_VENDOR_ID, + vdev->sub_vendor_id, ~0); + trace_vfio_pci_emulated_sub_vendor_id(vbasedev->name, + vdev->sub_vendor_id); + } + + if (vdev->sub_device_id !=3D PCI_ANY_ID) { + if (vdev->sub_device_id > 0xffff) { + error_setg(errp, "invalid PCI subsystem device ID provided"); + return; + } + vfio_add_emulated_word(vdev, PCI_SUBSYSTEM_ID, vdev->sub_device_id= , ~0); + trace_vfio_pci_emulated_sub_device_id(vbasedev->name, + vdev->sub_device_id); + } + + /* QEMU can change multi-function devices to single function, or rever= se */ + vdev->emulated_config_bits[PCI_HEADER_TYPE] =3D + PCI_HEADER_TYPE_MULTI_FUNCTI= ON; + + /* Restore or clear multifunction, this is always controlled by QEMU */ + if (vdev->pdev.cap_present & QEMU_PCI_CAP_MULTIFUNCTION) { + vdev->pdev.config[PCI_HEADER_TYPE] |=3D PCI_HEADER_TYPE_MULTI_FUNC= TION; + } else { + vdev->pdev.config[PCI_HEADER_TYPE] &=3D ~PCI_HEADER_TYPE_MULTI_FUN= CTION; + } + + /* + * Clear host resource mapping info. If we choose not to register a + * BAR, such as might be the case with the option ROM, we can get + * confusing, unwritable, residual addresses from the host here. + */ + memset(&vdev->pdev.config[PCI_BASE_ADDRESS_0], 0, 24); + memset(&vdev->pdev.config[PCI_ROM_ADDRESS], 0, 4); + + vfio_pci_size_rom(vdev); + + vfio_bars_prepare(vdev); + + vfio_msix_early_setup(vdev, &err); + if (err) { + error_propagate(errp, err); + return; + } + + vfio_bars_register(vdev); +} + +int vfio_interrupt_setup(VFIOPCIDevice *vdev, Error **errp) +{ + PCIDevice *pdev =3D &vdev->pdev; + int ret; + + /* QEMU emulates all of MSI & MSIX */ + if (pdev->cap_present & QEMU_PCI_CAP_MSIX) { + memset(vdev->emulated_config_bits + pdev->msix_cap, 0xff, + MSIX_CAP_LENGTH); + } + + if (pdev->cap_present & QEMU_PCI_CAP_MSI) { + memset(vdev->emulated_config_bits + pdev->msi_cap, 0xff, + vdev->msi_cap_size); + } + + if (vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT_PIN, 1)) { + vdev->intx.mmap_timer =3D timer_new_ms(QEMU_CLOCK_VIRTUAL, + vfio_intx_mmap_enable, v= dev); + pci_device_set_intx_routing_notifier(&vdev->pdev, + vfio_intx_routing_notifier); + vdev->irqchip_change_notifier.notify =3D vfio_irqchip_change; + kvm_irqchip_add_change_notifier(&vdev->irqchip_change_notifier); + ret =3D vfio_intx_enable(vdev, errp); + if (ret) { + pci_device_set_intx_routing_notifier(&vdev->pdev, NULL); + kvm_irqchip_remove_change_notifier(&vdev->irqchip_change_notif= ier); + return ret; + } + } + return 0; +} + static void vfio_realize(PCIDevice *pdev, Error **errp) { VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(pdev); @@ -2989,92 +3116,16 @@ static void vfio_realize(PCIDevice *pdev, Error **e= rrp) goto error; } =20 - /* vfio emulates a lot for us, but some bits need extra love */ - vdev->emulated_config_bits =3D g_malloc0(vdev->config_size); - - /* QEMU can choose to expose the ROM or not */ - memset(vdev->emulated_config_bits + PCI_ROM_ADDRESS, 0xff, 4); - /* QEMU can also add or extend BARs */ - memset(vdev->emulated_config_bits + PCI_BASE_ADDRESS_0, 0xff, 6 * 4); - - /* - * The PCI spec reserves vendor ID 0xffff as an invalid value. The - * device ID is managed by the vendor and need only be a 16-bit value. - * Allow any 16-bit value for subsystem so they can be hidden or chang= ed. - */ - if (vdev->vendor_id !=3D PCI_ANY_ID) { - if (vdev->vendor_id >=3D 0xffff) { - error_setg(errp, "invalid PCI vendor ID provided"); - goto error; - } - vfio_add_emulated_word(vdev, PCI_VENDOR_ID, vdev->vendor_id, ~0); - trace_vfio_pci_emulated_vendor_id(vbasedev->name, vdev->vendor_id); - } else { - vdev->vendor_id =3D pci_get_word(pdev->config + PCI_VENDOR_ID); - } - - if (vdev->device_id !=3D PCI_ANY_ID) { - if (vdev->device_id > 0xffff) { - error_setg(errp, "invalid PCI device ID provided"); - goto error; - } - vfio_add_emulated_word(vdev, PCI_DEVICE_ID, vdev->device_id, ~0); - trace_vfio_pci_emulated_device_id(vbasedev->name, vdev->device_id); - } else { - vdev->device_id =3D pci_get_word(pdev->config + PCI_DEVICE_ID); - } - - if (vdev->sub_vendor_id !=3D PCI_ANY_ID) { - if (vdev->sub_vendor_id > 0xffff) { - error_setg(errp, "invalid PCI subsystem vendor ID provided"); - goto error; - } - vfio_add_emulated_word(vdev, PCI_SUBSYSTEM_VENDOR_ID, - vdev->sub_vendor_id, ~0); - trace_vfio_pci_emulated_sub_vendor_id(vbasedev->name, - vdev->sub_vendor_id); - } - - if (vdev->sub_device_id !=3D PCI_ANY_ID) { - if (vdev->sub_device_id > 0xffff) { - error_setg(errp, "invalid PCI subsystem device ID provided"); - goto error; - } - vfio_add_emulated_word(vdev, PCI_SUBSYSTEM_ID, vdev->sub_device_id= , ~0); - trace_vfio_pci_emulated_sub_device_id(vbasedev->name, - vdev->sub_device_id); - } - - /* QEMU can change multi-function devices to single function, or rever= se */ - vdev->emulated_config_bits[PCI_HEADER_TYPE] =3D - PCI_HEADER_TYPE_MULTI_FUNCTI= ON; - - /* Restore or clear multifunction, this is always controlled by QEMU */ - if (vdev->pdev.cap_present & QEMU_PCI_CAP_MULTIFUNCTION) { - vdev->pdev.config[PCI_HEADER_TYPE] |=3D PCI_HEADER_TYPE_MULTI_FUNC= TION; - } else { - vdev->pdev.config[PCI_HEADER_TYPE] &=3D ~PCI_HEADER_TYPE_MULTI_FUN= CTION; - } - - /* - * Clear host resource mapping info. If we choose not to register a - * BAR, such as might be the case with the option ROM, we can get - * confusing, unwritable, residual addresses from the host here. - */ - memset(&vdev->pdev.config[PCI_BASE_ADDRESS_0], 0, 24); - memset(&vdev->pdev.config[PCI_ROM_ADDRESS], 0, 4); - - vfio_pci_size_rom(vdev); - - vfio_bars_prepare(vdev); - - vfio_msix_early_setup(vdev, &err); + vfio_pci_config_setup(vdev, &err); if (err) { - error_propagate(errp, err); goto error; } =20 - vfio_bars_register(vdev); + /* + * vfio_pci_config_setup will have registered the device's BARs + * and setup any MSIX BARs, so errors after it succeeds must + * use out_teardown + */ =20 ret =3D vfio_add_capabilities(vdev, errp); if (ret) { @@ -3115,29 +3166,15 @@ static void vfio_realize(PCIDevice *pdev, Error **e= rrp) } } =20 - /* QEMU emulates all of MSI & MSIX */ - if (pdev->cap_present & QEMU_PCI_CAP_MSIX) { - memset(vdev->emulated_config_bits + pdev->msix_cap, 0xff, - MSIX_CAP_LENGTH); - } - - if (pdev->cap_present & QEMU_PCI_CAP_MSI) { - memset(vdev->emulated_config_bits + pdev->msi_cap, 0xff, - vdev->msi_cap_size); + ret =3D vfio_interrupt_setup(vdev, errp); + if (ret) { + goto out_teardown; } =20 - if (vfio_pci_read_config(&vdev->pdev, PCI_INTERRUPT_PIN, 1)) { - vdev->intx.mmap_timer =3D timer_new_ms(QEMU_CLOCK_VIRTUAL, - vfio_intx_mmap_enable, v= dev); - pci_device_set_intx_routing_notifier(&vdev->pdev, - vfio_intx_routing_notifier); - vdev->irqchip_change_notifier.notify =3D vfio_irqchip_change; - kvm_irqchip_add_change_notifier(&vdev->irqchip_change_notifier); - ret =3D vfio_intx_enable(vdev, errp); - if (ret) { - goto out_deregister; - } - } + /* + * vfio_interrupt_setup will have setup INTx's KVM routing + * so errors after it succeeds must use out_deregister + */ =20 if (vdev->display !=3D ON_OFF_AUTO_OFF) { ret =3D vfio_display_probe(vdev, errp); diff --git a/hw/vfio/user-pci.c b/hw/vfio/user-pci.c index 900ab5f..55ffe7f 100644 --- a/hw/vfio/user-pci.c +++ b/hw/vfio/user-pci.c @@ -128,8 +128,51 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Err= or **errp) goto error; } =20 + vfio_populate_device(vdev, &err); + if (err) { + error_propagate(errp, err); + goto error; + } + + /* Get a copy of config space */ + ret =3D vbasedev->io->region_read(vbasedev, VFIO_PCI_CONFIG_REGION_IND= EX, 0, + MIN(pci_config_size(pdev), vdev->config_siz= e), + pdev->config); + if (ret < (int)MIN(pci_config_size(&vdev->pdev), vdev->config_size)) { + error_setg_errno(errp, -ret, "failed to read device config space"); + goto error; + } + + vfio_pci_config_setup(vdev, &err); + if (err) { + error_propagate(errp, err); + goto error; + } + + /* + * vfio_pci_config_setup will have registered the device's BARs + * and setup any MSIX BARs, so errors after it succeeds must + * use out_teardown + */ + + ret =3D vfio_add_capabilities(vdev, errp); + if (ret) { + goto out_teardown; + } + + ret =3D vfio_interrupt_setup(vdev, errp); + if (ret) { + goto out_teardown; + } + + vfio_register_err_notifier(vdev); + vfio_register_req_notifier(vdev); + return; =20 +out_teardown: + vfio_teardown_msi(vdev); + vfio_bars_exit(vdev); error: error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name); } @@ -139,6 +182,10 @@ static void vfio_user_instance_finalize(Object *obj) VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(obj); VFIODevice *vbasedev =3D &vdev->vbasedev; =20 + vfio_bars_finalize(vdev); + g_free(vdev->emulated_config_bits); + g_free(vdev->rom); + vfio_put_device(vdev); =20 if (vbasedev->proxy !=3D NULL) { --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675317085; cv=none; d=zohomail.com; s=zohoarc; b=IXv5SEiNam274jl6RubGiGMdRk1NTLk5rUyb22Efgh0wbIIJ7h6LiETVKieTj9ZDwiWHT35lCAqR2NSSKmwDT4P76CQlqu9QycsIa8Mb3GC2G/iMo6j7rRzj99EhKZN6dflV1fnwl+2y74xHkvNNr1crk+oLbKZgnUqcRD+wCWU= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675317085; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=SZE2kaqN9AVOXqO0DYni1p3anezM9kYRlZcC2YbWA4Y=; b=j2GTAJJp4pfIJ1bOrcbokJXdTpBvl+ot77YihPsGbFAgav4oQDJrLDbWw/MYBuIV2WoZyxrVWxTwCC4SBs0Cghc7MO4VPdEmwjuBF3GKyasLvJwhQH3sg426ZFOllt0azll9IUjfqpF2XTLTX9YHVjwNJocCRoonviBYQZEX3pw= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1675317085102999.5227480689706; Wed, 1 Feb 2023 21:51:25 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSPN-0008Co-L4; Thu, 02 Feb 2023 00:45:17 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPJ-00089r-Le for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:13 -0500 Received: from mx0a-00069f02.pphosted.com ([205.220.165.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPH-0007B5-CC for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:13 -0500 Received: from pps.filterd (m0333521.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124i57a021989; Thu, 2 Feb 2023 05:45:09 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfq28syxr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:45:09 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254Ppl013015; Thu, 2 Feb 2023 05:45:07 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-15 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:45:07 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=SZE2kaqN9AVOXqO0DYni1p3anezM9kYRlZcC2YbWA4Y=; b=SbHllP9pUaIFgIUkShIgROINBIc/nGOxXN6mehg0n2Rl4Ub/Y4VxpfRha8yJt8psGBzt FQEBsBZxcs1mvMDeS0dNBBjWdsE1H0PdEbyIQQQW/6CvJzwwS2hQsdF2/DlH3REkOvSf SKnjibe+ht8CWA7+Sixue8L4I4M2TwhX+TsBIzNOoRpAgs6xjsGwspig7o/0LqHI9eo1 ydMrcd5VAa9z2O7dOxziCcXLSsr3oliGmK7oKalzb6uqW1ygDETqOaBazA8srRqdN73x LtB2DtPYjlYlGJ4Q1a0BUcCACDjsq8Fi5I9IzkyEco6V4mjV036TrBLaccnv9hAmvXFn uw== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 14/23] vfio-user: get and set IRQs Date: Wed, 1 Feb 2023 21:55:50 -0800 Message-Id: <3cfa054d8622df1d5f2847fe29d298dad70cf571.1675228037.git.john.g.johnson@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-GUID: VlLuRZmcJm57WIidx9R1yrNd3HuUeNqY X-Proofpoint-ORIG-GUID: VlLuRZmcJm57WIidx9R1yrNd3HuUeNqY Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.165.32; envelope-from=john.g.johnson@oracle.com; helo=mx0a-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675317086762100001 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Signed-off-by: Elena Ufimtseva Signed-off-by: John G Johnson Signed-off-by: Jagannathan Raman --- hw/vfio/user-protocol.h | 25 +++++++++ hw/vfio/pci.c | 3 +- hw/vfio/user.c | 140 ++++++++++++++++++++++++++++++++++++++++++++= ++++ hw/vfio/trace-events | 2 + 4 files changed, 169 insertions(+), 1 deletion(-) diff --git a/hw/vfio/user-protocol.h b/hw/vfio/user-protocol.h index 6987435..48dd475 100644 --- a/hw/vfio/user-protocol.h +++ b/hw/vfio/user-protocol.h @@ -140,6 +140,31 @@ typedef struct { } VFIOUserRegionInfo; =20 /* + * VFIO_USER_DEVICE_GET_IRQ_INFO + * imported from struct vfio_irq_info + */ +typedef struct { + VFIOUserHdr hdr; + uint32_t argsz; + uint32_t flags; + uint32_t index; + uint32_t count; +} VFIOUserIRQInfo; + +/* + * VFIO_USER_DEVICE_SET_IRQS + * imported from struct vfio_irq_set + */ +typedef struct { + VFIOUserHdr hdr; + uint32_t argsz; + uint32_t flags; + uint32_t index; + uint32_t start; + uint32_t count; +} VFIOUserIRQSet; + +/* * VFIO_USER_REGION_READ * VFIO_USER_REGION_WRITE */ diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index b8b5c34..42e7c82 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -711,7 +711,8 @@ retry: ret =3D vfio_enable_vectors(vdev, false); if (ret) { if (ret < 0) { - error_report("vfio: Error: Failed to setup MSI fds: %m"); + error_report("vfio: Error: Failed to setup MSI fds: %s", + strerror(-ret)); } else { error_report("vfio: Error: Failed to enable %d " "MSI vectors, retry with %d", vdev->nr_vectors, r= et); diff --git a/hw/vfio/user.c b/hw/vfio/user.c index 389c807..d66dc1b 100644 --- a/hw/vfio/user.c +++ b/hw/vfio/user.c @@ -1208,6 +1208,122 @@ static int vfio_user_get_region_info(VFIOUserProxy = *proxy, return 0; } =20 +static int vfio_user_get_irq_info(VFIOUserProxy *proxy, + struct vfio_irq_info *info) +{ + VFIOUserIRQInfo msg; + + memset(&msg, 0, sizeof(msg)); + vfio_user_request_msg(&msg.hdr, VFIO_USER_DEVICE_GET_IRQ_INFO, + sizeof(msg), 0); + msg.argsz =3D info->argsz; + msg.index =3D info->index; + + vfio_user_send_wait(proxy, &msg.hdr, NULL, 0, false); + if (msg.hdr.flags & VFIO_USER_ERROR) { + return -msg.hdr.error_reply; + } + trace_vfio_user_get_irq_info(msg.index, msg.flags, msg.count); + + memcpy(info, &msg.argsz, sizeof(*info)); + return 0; +} + +static int irq_howmany(int *fdp, uint32_t cur, uint32_t max) +{ + int n =3D 0; + + if (fdp[cur] !=3D -1) { + do { + n++; + } while (n < max && fdp[cur + n] !=3D -1); + } else { + do { + n++; + } while (n < max && fdp[cur + n] =3D=3D -1); + } + + return n; +} + +static int vfio_user_set_irqs(VFIOUserProxy *proxy, struct vfio_irq_set *i= rq) +{ + g_autofree VFIOUserIRQSet *msgp =3D NULL; + uint32_t size, nfds, send_fds, sent_fds, max; + + if (irq->argsz < sizeof(*irq)) { + error_printf("vfio_user_set_irqs argsz too small\n"); + return -EINVAL; + } + + /* + * Handle simple case + */ + if ((irq->flags & VFIO_IRQ_SET_DATA_EVENTFD) =3D=3D 0) { + size =3D sizeof(VFIOUserHdr) + irq->argsz; + msgp =3D g_malloc0(size); + + vfio_user_request_msg(&msgp->hdr, VFIO_USER_DEVICE_SET_IRQS, size,= 0); + msgp->argsz =3D irq->argsz; + msgp->flags =3D irq->flags; + msgp->index =3D irq->index; + msgp->start =3D irq->start; + msgp->count =3D irq->count; + trace_vfio_user_set_irqs(msgp->index, msgp->start, msgp->count, + msgp->flags); + + vfio_user_send_wait(proxy, &msgp->hdr, NULL, 0, false); + if (msgp->hdr.flags & VFIO_USER_ERROR) { + return -msgp->hdr.error_reply; + } + + return 0; + } + + /* + * Calculate the number of FDs to send + * and adjust argsz + */ + nfds =3D (irq->argsz - sizeof(*irq)) / sizeof(int); + irq->argsz =3D sizeof(*irq); + msgp =3D g_malloc0(sizeof(*msgp)); + /* + * Send in chunks if over max_send_fds + */ + for (sent_fds =3D 0; nfds > sent_fds; sent_fds +=3D send_fds) { + VFIOUserFDs *arg_fds, loop_fds; + + /* must send all valid FDs or all invalid FDs in single msg */ + max =3D nfds - sent_fds; + if (max > proxy->max_send_fds) { + max =3D proxy->max_send_fds; + } + send_fds =3D irq_howmany((int *)irq->data, sent_fds, max); + + vfio_user_request_msg(&msgp->hdr, VFIO_USER_DEVICE_SET_IRQS, + sizeof(*msgp), 0); + msgp->argsz =3D irq->argsz; + msgp->flags =3D irq->flags; + msgp->index =3D irq->index; + msgp->start =3D irq->start + sent_fds; + msgp->count =3D send_fds; + trace_vfio_user_set_irqs(msgp->index, msgp->start, msgp->count, + msgp->flags); + + loop_fds.send_fds =3D send_fds; + loop_fds.recv_fds =3D 0; + loop_fds.fds =3D (int *)irq->data + sent_fds; + arg_fds =3D loop_fds.fds[0] !=3D -1 ? &loop_fds : NULL; + + vfio_user_send_wait(proxy, &msgp->hdr, arg_fds, 0, false); + if (msgp->hdr.flags & VFIO_USER_ERROR) { + return -msgp->hdr.error_reply; + } + } + + return 0; +} + static int vfio_user_region_read(VFIOUserProxy *proxy, uint8_t index, off_t offset, uint32_t count, void *data) { @@ -1304,6 +1420,28 @@ static int vfio_user_io_get_region_info(VFIODevice *= vbasedev, return 0; } =20 +static int vfio_user_io_get_irq_info(VFIODevice *vbasedev, + struct vfio_irq_info *irq) +{ + int ret; + + ret =3D vfio_user_get_irq_info(vbasedev->proxy, irq); + if (ret) { + return ret; + } + + if (irq->index > vbasedev->num_irqs) { + return -EINVAL; + } + return 0; +} + +static int vfio_user_io_set_irqs(VFIODevice *vbasedev, + struct vfio_irq_set *irqs) +{ + return vfio_user_set_irqs(vbasedev->proxy, irqs); +} + static int vfio_user_io_region_read(VFIODevice *vbasedev, uint8_t index, off_t off, uint32_t size, void *data) { @@ -1320,6 +1458,8 @@ static int vfio_user_io_region_write(VFIODevice *vbas= edev, uint8_t index, =20 VFIODeviceIO vfio_dev_io_sock =3D { .get_region_info =3D vfio_user_io_get_region_info, + .get_irq_info =3D vfio_user_io_get_irq_info, + .set_irqs =3D vfio_user_io_set_irqs, .region_read =3D vfio_user_io_region_read, .region_write =3D vfio_user_io_region_write, }; diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index 1f3688f..01563cb 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -176,3 +176,5 @@ vfio_user_version(uint16_t major, uint16_t minor, const= char *caps) " major %d m vfio_user_get_info(uint32_t nregions, uint32_t nirqs) " #regions %d #irqs = %d" vfio_user_get_region_info(uint32_t index, uint32_t flags, uint64_t size) "= index %d flags 0x%x size 0x%"PRIx64 vfio_user_region_rw(uint32_t region, uint64_t off, uint32_t count) " regio= n %d offset 0x%"PRIx64" count %d" +vfio_user_get_irq_info(uint32_t index, uint32_t flags, uint32_t count) " i= ndex %d flags 0x%x count %d" +vfio_user_set_irqs(uint32_t index, uint32_t start, uint32_t count, uint32_= t flags) " index %d start %d count %d flags 0x%x" --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675317077; cv=none; d=zohomail.com; s=zohoarc; b=l5T2IlBMFOORdjnpsnKhiQQP2ooS0CvtQR3PYum2IIREqO0bNezoKe2kyCg89fhpzZbTc6UBKoXgFsv2Qbgt3zkEopmHpQJYGCP+voFnWn7PL5kRbjPj14lpON43V9/IzhX44a+9SdUu/hwEjRoB0WuJBDbPeCPhk4+XGWN9xjE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675317077; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=wPJMDakUdlTl5Jo8KnT5A+Csdw+dktyZijLe2cdheac=; b=dW8gukkNZmKHADjjVPq3LUl2p8ZzRv9fnQN+3z3LDXaz250QQd3w6BlvyRLbClU5jKojj9ycrSpq73b7TeD6GwUjcS/CRKJQxUGKT5HddKYRO8v7rxfEAg1QCCLOqdgrV/3llWeccML3icoYW4qt7OIg48oQmLPHy7seED/9zjc= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1675317077227332.26351306871925; Wed, 1 Feb 2023 21:51:17 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSPM-0008C1-Ir; Thu, 02 Feb 2023 00:45:16 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPL-0008BL-CZ for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:15 -0500 Received: from mx0a-00069f02.pphosted.com ([205.220.165.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPI-0007BP-4e for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:15 -0500 Received: from pps.filterd (m0333521.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124i4Lu021779; Thu, 2 Feb 2023 05:45:10 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfq28syxs-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:45:09 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254Ppm013015; Thu, 2 Feb 2023 05:45:08 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-16 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:45:08 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=wPJMDakUdlTl5Jo8KnT5A+Csdw+dktyZijLe2cdheac=; b=aXjYuLJ09hICVCEixTTDmN1Bxkb03JBQboh23NbpsacLdYx5vOng+8gILD7u9A9yecEy TLTuRO5V15IwNGW+5NEoJovIMZBfu2KF8eNgZN3LFB8w2p2zY6I2WfYOKfAdUj2fs5hZ 0bnUZuf564fvAjAVce+vGMn8mYNQnmyWy2Ggh6ZSNLYQq0XoW26IpcsMEBFND9DFU3OV P1tEwvZOuMQqhzFq+uxojNDi78sgF/4XGRwsoFPnDQnBR5NXeKu6yNcGzqy0QuMqe2Ye G6Sx2b6bLUKJd921bjjDY3AwJrnudWtWyFrSAFYM5xguJQnnX0/JA3DRWE9QOcVZghIv LA== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 15/23] vfio-user: forward msix BAR accesses to server Date: Wed, 1 Feb 2023 21:55:51 -0800 Message-Id: <0ad69e4ea3d1f37246ce5e32ba833d6c871e99b1.1675228037.git.john.g.johnson@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-GUID: EJAzeLGMLc-7bZWZIY3c7YrdOJJxPGbP X-Proofpoint-ORIG-GUID: EJAzeLGMLc-7bZWZIY3c7YrdOJJxPGbP Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.165.32; envelope-from=john.g.johnson@oracle.com; helo=mx0a-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675317078743100001 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Server holds device current device pending state Use irq masking commands in socket case Signed-off-by: John G Johnson Signed-off-by: Elena Ufimtseva Signed-off-by: Jagannathan Raman --- hw/vfio/pci.h | 1 + include/hw/vfio/vfio-common.h | 3 ++ hw/vfio/ccw.c | 1 + hw/vfio/common.c | 26 ++++++++++++++++++ hw/vfio/pci.c | 23 +++++++++++++++- hw/vfio/platform.c | 1 + hw/vfio/user-pci.c | 64 +++++++++++++++++++++++++++++++++++++++= ++++ 7 files changed, 118 insertions(+), 1 deletion(-) diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index 4f70664..d3e5d5f 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -113,6 +113,7 @@ typedef struct VFIOMSIXInfo { uint32_t table_offset; uint32_t pba_offset; unsigned long *pending; + MemoryRegion *pba_region; } VFIOMSIXInfo; =20 /* diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index bbc4b15..2c58d7d 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -143,6 +143,7 @@ typedef struct VFIODevice { bool ram_block_discard_allowed; bool enable_migration; bool use_regfds; + bool can_mask_irq; VFIODeviceOps *ops; VFIODeviceIO *io; unsigned int num_irqs; @@ -239,6 +240,8 @@ void vfio_put_base_device(VFIODevice *vbasedev); void vfio_disable_irqindex(VFIODevice *vbasedev, int index); void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index); void vfio_mask_single_irqindex(VFIODevice *vbasedev, int index); +void vfio_unmask_single_irq(VFIODevice *vbasedev, int index, int irq); +void vfio_mask_single_irq(VFIODevice *vbasedev, int index, int irq); int vfio_set_irq_signaling(VFIODevice *vbasedev, int index, int subindex, int action, int fd, Error **errp); void vfio_region_write(void *opaque, hwaddr addr, diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c index 00605bd..bf67670 100644 --- a/hw/vfio/ccw.c +++ b/hw/vfio/ccw.c @@ -616,6 +616,7 @@ static void vfio_ccw_get_device(VFIOGroup *group, VFIOC= CWDevice *vcdev, vcdev->vdev.dev =3D &vcdev->cdev.parent_obj.parent_obj; vcdev->vdev.io =3D &vfio_dev_io_ioctl; vcdev->vdev.use_regfds =3D false; + vcdev->vdev.can_mask_irq =3D false; =20 return; =20 diff --git a/hw/vfio/common.c b/hw/vfio/common.c index de64e53..0c1cb21 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -102,6 +102,32 @@ void vfio_mask_single_irqindex(VFIODevice *vbasedev, i= nt index) vbasedev->io->set_irqs(vbasedev, &irq_set); } =20 +void vfio_mask_single_irq(VFIODevice *vbasedev, int index, int irq) +{ + struct vfio_irq_set irq_set =3D { + .argsz =3D sizeof(irq_set), + .flags =3D VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_MASK, + .index =3D index, + .start =3D irq, + .count =3D 1, + }; + + vbasedev->io->set_irqs(vbasedev, &irq_set); +} + +void vfio_unmask_single_irq(VFIODevice *vbasedev, int index, int irq) +{ + struct vfio_irq_set irq_set =3D { + .argsz =3D sizeof(irq_set), + .flags =3D VFIO_IRQ_SET_DATA_NONE | VFIO_IRQ_SET_ACTION_UNMASK, + .index =3D index, + .start =3D irq, + .count =3D 1, + }; + + vbasedev->io->set_irqs(vbasedev, &irq_set); +} + static inline const char *action_to_str(int action) { switch (action) { diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 42e7c82..7b16f8f 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -477,6 +477,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, uns= igned int nr, { VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(pdev); VFIOMSIVector *vector; + bool new_vec =3D false; int ret; =20 trace_vfio_msix_vector_do_use(vdev->vbasedev.name, nr); @@ -490,6 +491,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, uns= igned int nr, error_report("vfio: Error: event_notifier_init failed"); } vector->use =3D true; + new_vec =3D true; msix_vector_use(pdev, nr); } =20 @@ -516,6 +518,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, uns= igned int nr, kvm_irqchip_commit_route_changes(&vfio_route_change); vfio_connect_kvm_msi_virq(vector); } + new_vec =3D true; } } =20 @@ -523,6 +526,8 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, uns= igned int nr, * We don't want to have the host allocate all possible MSI vectors * for a device if they're not in use, so we shutdown and incrementally * increase them as needed. + * Otherwise, unmask the vector if the vector is already setup (and we= can + * do so) or send the fd if not. */ if (vdev->nr_vectors < nr + 1) { vdev->nr_vectors =3D nr + 1; @@ -533,6 +538,8 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, uns= igned int nr, error_report("vfio: failed to enable vectors, %d", ret); } } + } else if (vdev->vbasedev.can_mask_irq && !new_vec) { + vfio_unmask_single_irq(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX, n= r); } else { Error *err =3D NULL; int32_t fd; @@ -574,6 +581,12 @@ static void vfio_msix_vector_release(PCIDevice *pdev, = unsigned int nr) =20 trace_vfio_msix_vector_release(vdev->vbasedev.name, nr); =20 + /* just mask vector if peer supports it */ + if (vdev->vbasedev.can_mask_irq) { + vfio_mask_single_irq(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX, nr); + return; + } + /* * There are still old guests that mask and unmask vectors on every * interrupt. If we're using QEMU bypass with a KVM irqfd, leave all = of @@ -644,7 +657,7 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev) if (ret) { error_report("vfio: failed to enable vectors, %d", ret); } - } else { + } else if (!vdev->vbasedev.can_mask_irq) { /* * Some communication channels between VF & PF or PF & fw rely on = the * physical state of the device and expect that enabling MSI-X fro= m the @@ -660,6 +673,13 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev) */ vfio_msix_vector_do_use(&vdev->pdev, 0, NULL, NULL); vfio_msix_vector_release(&vdev->pdev, 0); + } else { + /* + * If we can use irq masking, send an invalid fd on vector 0 + * to enable MSI-X without any vectors enabled. + */ + vfio_set_irq_signaling(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX, 0, + VFIO_IRQ_SET_ACTION_TRIGGER, -1, NULL); } =20 trace_vfio_msix_enable(vdev->vbasedev.name); @@ -3040,6 +3060,7 @@ static void vfio_realize(PCIDevice *pdev, Error **err= p) vbasedev->dev =3D DEVICE(vdev); vbasedev->io =3D &vfio_dev_io_ioctl; vbasedev->use_regfds =3D false; + vbasedev->can_mask_irq =3D false; =20 tmp =3D g_strdup_printf("%s/iommu_group", vbasedev->sysfsdev); len =3D readlink(tmp, group_path, sizeof(group_path)); diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c index 8ddfcca..3387ec4 100644 --- a/hw/vfio/platform.c +++ b/hw/vfio/platform.c @@ -623,6 +623,7 @@ static void vfio_platform_realize(DeviceState *dev, Err= or **errp) vbasedev->ops =3D &vfio_platform_ops; vbasedev->io =3D &vfio_dev_io_ioctl; vbasedev->use_regfds =3D false; + vbasedev->can_mask_irq =3D false; =20 qemu_mutex_init(&vdev->intp_mutex); =20 diff --git a/hw/vfio/user-pci.c b/hw/vfio/user-pci.c index 55ffe7f..bc1d01a 100644 --- a/hw/vfio/user-pci.c +++ b/hw/vfio/user-pci.c @@ -45,6 +45,62 @@ struct VFIOUserPCIDevice { }; =20 /* + * The server maintains the device's pending interrupts, + * via its MSIX table and PBA, so we treat these acceses + * like PCI config space and forward them. + */ +static uint64_t vfio_user_pba_read(void *opaque, hwaddr addr, + unsigned size) +{ + VFIOPCIDevice *vdev =3D opaque; + VFIORegion *region =3D &vdev->bars[vdev->msix->pba_bar].region; + uint64_t data; + + /* server copy is what matters */ + data =3D vfio_region_read(region, addr + vdev->msix->pba_offset, size); + return data; +} + +static void vfio_user_pba_write(void *opaque, hwaddr addr, + uint64_t data, unsigned size) +{ + /* dropped */ +} + +static const MemoryRegionOps vfio_user_pba_ops =3D { + .read =3D vfio_user_pba_read, + .write =3D vfio_user_pba_write, + .endianness =3D DEVICE_LITTLE_ENDIAN, +}; + +static void vfio_user_msix_setup(VFIOPCIDevice *vdev) +{ + MemoryRegion *vfio_reg, *msix_reg, *pba_reg; + + pba_reg =3D g_new0(MemoryRegion, 1); + vdev->msix->pba_region =3D pba_reg; + + vfio_reg =3D vdev->bars[vdev->msix->pba_bar].mr; + msix_reg =3D &vdev->pdev.msix_pba_mmio; + memory_region_init_io(pba_reg, OBJECT(vdev), &vfio_user_pba_ops, vdev, + "VFIO MSIX PBA", int128_get64(msix_reg->size)); + memory_region_add_subregion_overlap(vfio_reg, vdev->msix->pba_offset, + pba_reg, 1); +} + +static void vfio_user_msix_teardown(VFIOPCIDevice *vdev) +{ + MemoryRegion *mr, *sub; + + mr =3D vdev->bars[vdev->msix->pba_bar].mr; + sub =3D vdev->msix->pba_region; + memory_region_del_subregion(mr, sub); + + g_free(vdev->msix->pba_region); + vdev->msix->pba_region =3D NULL; +} + +/* * Incoming request message callback. * * Runs off main loop, so BQL held. @@ -122,6 +178,7 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Erro= r **errp) vbasedev->dev =3D DEVICE(vdev); vbasedev->io =3D &vfio_dev_io_sock; vbasedev->use_regfds =3D true; + vbasedev->can_mask_irq =3D true; =20 ret =3D vfio_user_get_device(vbasedev, errp); if (ret) { @@ -159,6 +216,9 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Erro= r **errp) if (ret) { goto out_teardown; } + if (vdev->msix !=3D NULL) { + vfio_user_msix_setup(vdev); + } =20 ret =3D vfio_interrupt_setup(vdev, errp); if (ret) { @@ -186,6 +246,10 @@ static void vfio_user_instance_finalize(Object *obj) g_free(vdev->emulated_config_bits); g_free(vdev->rom); =20 + if (vdev->msix !=3D NULL) { + vfio_user_msix_teardown(vdev); + } + vfio_put_device(vdev); =20 if (vbasedev->proxy !=3D NULL) { --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675317081; cv=none; d=zohomail.com; s=zohoarc; b=M0xBB8KCHtOwdUv1PSKtcUXY3kqmjCvs2d9MxJiZ1YyGEbLUzftZOxlN87GZ+85n1WRdyGdNGPWrzV8BLK84+9kUxca9SWzrpEn5SmPEW7udrQdfJ/lGw0vuyr/AsCUhdGQam5J71eAbz5d2ZAdO+BR7cV4ZeHDTS4+qWQC76b0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675317081; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=dTNQOHB5u1Wtnb2fLXKHkYn8UQhrsFrLkJjyTzIWJH0=; b=lyzr3beO9cP4avix3Lwdhk4X//NU45Mz1eovUwWohCCBMo8FqY04YQuinykOCNK3NJnbcUYO0rPDXp2dEzkniENz9MvP6SJjJKSNtox34KDIl7B3xop1je5ZioX6N41Y8z/GRkB456iAMPQhmqp9iLBVKY1HEvpPWdmXVyYBFR4= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1675317081969609.9479839885196; Wed, 1 Feb 2023 21:51:21 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSPM-0008C4-VB; Thu, 02 Feb 2023 00:45:16 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPK-0008Ap-8e for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:14 -0500 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPH-0007B8-CN for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:14 -0500 Received: from pps.filterd (m0246631.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124i9gD029368; Thu, 2 Feb 2023 05:45:10 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfmbg2bhg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:45:10 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254Ppn013015; Thu, 2 Feb 2023 05:45:09 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-17 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:45:09 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=dTNQOHB5u1Wtnb2fLXKHkYn8UQhrsFrLkJjyTzIWJH0=; b=pxrorDuMLPKDWyfNhcufqpIgJfuhYP+uVrobb7FceL+04HluRE+ZO6WtFj0IyvESN03z cCjnap+wKCIEsHppErsPzmwTdmjfid84x9gb0MYPgP3OtEXy/gNczvII2hq4eRy9KX71 6Yc/dB8AfS/B/U2T1Cf7phrxhr6bDqwGMWmXXkFjmXu3gtGEeOJJF7wWj6FRVIYwhIEG LExdNQrvz9ogj7NNuEotk4KOVv32OUfDTKnyusTxSq3UeapDM13npkyZPoYdssje1MR6 LNtzcgSSTVO1VX8Jfgz5sZE5TsbXjjCukP3NW88fZH2NPTntqX/KEi0qmBvlP4uijd2J sw== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 16/23] vfio-user: proxy container connect/disconnect Date: Wed, 1 Feb 2023 21:55:52 -0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-ORIG-GUID: EEKYjn_LcwCryIR3Q5UC6wMbOkSivPAs X-Proofpoint-GUID: EEKYjn_LcwCryIR3Q5UC6wMbOkSivPAs Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.177.32; envelope-from=john.g.johnson@oracle.com; helo=mx0b-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675317082701100001 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Signed-off-by: Elena Ufimtseva Signed-off-by: John G Johnson Signed-off-by: Jagannathan Raman --- hw/vfio/user.h | 6 +- include/hw/vfio/vfio-common.h | 10 +++ hw/vfio/common.c | 100 ++++++++++++++++----------- hw/vfio/user-pci.c | 12 +++- hw/vfio/user.c | 152 ++++++++++++++++++++++++++++++++++++++= +++- 5 files changed, 237 insertions(+), 43 deletions(-) diff --git a/hw/vfio/user.h b/hw/vfio/user.h index 3012a86..b89e5ca 100644 --- a/hw/vfio/user.h +++ b/hw/vfio/user.h @@ -91,9 +91,13 @@ void vfio_user_disconnect(VFIOUserProxy *proxy); void vfio_user_set_handler(VFIODevice *vbasedev, void (*handler)(void *opaque, VFIOUserMsg *msg), void *reqarg); -int vfio_user_get_device(VFIODevice *vbasedev, Error **errp); +int vfio_user_get_device(VFIOGroup *group, VFIODevice *vbasedev, Error **e= rrp); +VFIOGroup *vfio_user_get_group(VFIOUserProxy *proxy, AddressSpace *as, + Error **errp); +void vfio_user_put_group(VFIOGroup *group); int vfio_user_validate_version(VFIOUserProxy *proxy, Error **errp); =20 extern VFIODeviceIO vfio_dev_io_sock; +extern VFIOContainerIO vfio_cont_io_sock; =20 #endif /* VFIO_USER_H */ diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index 2c58d7d..b0c4453 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -94,6 +94,7 @@ typedef struct VFIOContainer { uint64_t max_dirty_bitmap_size; unsigned long pgsizes; unsigned int dma_max_mappings; + VFIOUserProxy *proxy; QLIST_HEAD(, VFIOGuestIOMMU) giommu_list; QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list; QLIST_HEAD(, VFIOGroup) group_list; @@ -236,6 +237,7 @@ typedef struct VFIODisplay { } dmabuf; } VFIODisplay; =20 +int vfio_ram_block_discard_disable(VFIOContainer *container, bool state); void vfio_put_base_device(VFIODevice *vbasedev); void vfio_disable_irqindex(VFIODevice *vbasedev, int index); void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index); @@ -244,6 +246,9 @@ void vfio_unmask_single_irq(VFIODevice *vbasedev, int i= ndex, int irq); void vfio_mask_single_irq(VFIODevice *vbasedev, int index, int irq); int vfio_set_irq_signaling(VFIODevice *vbasedev, int index, int subindex, int action, int fd, Error **errp); +void vfio_host_win_add(VFIOContainer *container, hwaddr min_iova, + hwaddr max_iova, uint64_t iova_pgsizes); +void vfio_listener_release(VFIOContainer *container); void vfio_region_write(void *opaque, hwaddr addr, uint64_t data, unsigned size); uint64_t vfio_region_read(void *opaque, @@ -256,11 +261,16 @@ void vfio_region_unmap(VFIORegion *region); void vfio_region_exit(VFIORegion *region); void vfio_region_finalize(VFIORegion *region); void vfio_reset_handler(void *opaque); +VFIOAddressSpace *vfio_get_address_space(AddressSpace *as); +void vfio_put_address_space(VFIOAddressSpace *space); VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp); void vfio_put_group(VFIOGroup *group); int vfio_get_device(VFIOGroup *group, const char *name, VFIODevice *vbasedev, Error **errp); =20 +VFIOContainer *vfio_new_container(VFIOAddressSpace *space); +void vfio_link_container(VFIOContainer *container, VFIOGroup *group); +void vfio_unmap_container(VFIOContainer *container); void vfio_init_device(VFIODevice *vbasedev, VFIOGroup *group, struct vfio_device_info *info); =20 diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 0c1cb21..6f99907 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -164,7 +164,7 @@ static const char *index_to_str(VFIODevice *vbasedev, i= nt index) } } =20 -static int vfio_ram_block_discard_disable(VFIOContainer *container, bool s= tate) +int vfio_ram_block_discard_disable(VFIOContainer *container, bool state) { switch (container->iommu_type) { case VFIO_TYPE1v2_IOMMU: @@ -532,7 +532,7 @@ static int vfio_dma_map(VFIOContainer *container, hwadd= r iova, return ret; } =20 -static void vfio_host_win_add(VFIOContainer *container, +void vfio_host_win_add(VFIOContainer *container, hwaddr min_iova, hwaddr max_iova, uint64_t iova_pgsizes) { @@ -1495,7 +1495,7 @@ static const MemoryListener vfio_memory_listener =3D { .log_sync =3D vfio_listener_log_sync, }; =20 -static void vfio_listener_release(VFIOContainer *container) +void vfio_listener_release(VFIOContainer *container) { memory_listener_unregister(&container->listener); if (container->iommu_type =3D=3D VFIO_SPAPR_TCE_v2_IOMMU) { @@ -1873,7 +1873,7 @@ static void vfio_kvm_device_del_group(VFIOGroup *grou= p) #endif } =20 -static VFIOAddressSpace *vfio_get_address_space(AddressSpace *as) +VFIOAddressSpace *vfio_get_address_space(AddressSpace *as) { VFIOAddressSpace *space; =20 @@ -1893,7 +1893,7 @@ static VFIOAddressSpace *vfio_get_address_space(Addre= ssSpace *as) return space; } =20 -static void vfio_put_address_space(VFIOAddressSpace *space) +void vfio_put_address_space(VFIOAddressSpace *space) { if (QLIST_EMPTY(&space->containers)) { QLIST_REMOVE(space, list); @@ -2024,6 +2024,34 @@ static void vfio_get_iommu_info_migration(VFIOContai= ner *container, } } =20 +VFIOContainer *vfio_new_container(VFIOAddressSpace *space) +{ + VFIOContainer *container; + + container =3D g_malloc0(sizeof(*container)); + container->space =3D space; + container->error =3D NULL; + QLIST_INIT(&container->giommu_list); + QLIST_INIT(&container->hostwin_list); + QLIST_INIT(&container->vrdl_list); + QLIST_INIT(&container->group_list); + + return container; +} + +void vfio_link_container(VFIOContainer *container, VFIOGroup *group) +{ + VFIOAddressSpace *space =3D container->space; + + QLIST_INSERT_HEAD(&space->containers, container, next); + + group->container =3D container; + QLIST_INSERT_HEAD(&container->group_list, group, container_next); + + container->listener =3D vfio_memory_listener; + memory_listener_register(&container->listener, space->as); +} + static int vfio_connect_container(VFIOGroup *group, AddressSpace *as, Error **errp) { @@ -2099,16 +2127,11 @@ static int vfio_connect_container(VFIOGroup *group,= AddressSpace *as, goto close_fd_exit; } =20 - container =3D g_malloc0(sizeof(*container)); - container->space =3D space; + container =3D vfio_new_container(space); container->fd =3D fd; - container->error =3D NULL; container->dirty_pages_supported =3D false; container->dma_max_mappings =3D 0; container->io =3D &vfio_cont_io_ioctl; - QLIST_INIT(&container->giommu_list); - QLIST_INIT(&container->hostwin_list); - QLIST_INIT(&container->vrdl_list); =20 ret =3D vfio_init_container(container, group->fd, errp); if (ret) { @@ -2223,15 +2246,7 @@ static int vfio_connect_container(VFIOGroup *group, = AddressSpace *as, =20 vfio_kvm_device_add_group(group); =20 - QLIST_INIT(&container->group_list); - QLIST_INSERT_HEAD(&space->containers, container, next); - - group->container =3D container; - QLIST_INSERT_HEAD(&container->group_list, group, container_next); - - container->listener =3D vfio_memory_listener; - - memory_listener_register(&container->listener, container->space->as); + vfio_link_container(container, group); =20 if (container->error) { ret =3D -1; @@ -2264,9 +2279,31 @@ put_space_exit: return ret; } =20 +void vfio_unmap_container(VFIOContainer *container) +{ + VFIOGuestIOMMU *giommu, *tmp; + VFIOHostDMAWindow *hostwin, *next; + + QLIST_REMOVE(container, next); + + QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) { + memory_region_unregister_iommu_notifier( + MEMORY_REGION(giommu->iommu_mr), &giommu->n); + QLIST_REMOVE(giommu, giommu_next); + g_free(giommu); + } + + QLIST_FOREACH_SAFE(hostwin, &container->hostwin_list, hostwin_next, + next) { + QLIST_REMOVE(hostwin, hostwin_next); + g_free(hostwin); + } +} + static void vfio_disconnect_container(VFIOGroup *group) { VFIOContainer *container =3D group->container; + VFIOAddressSpace *space =3D container->space; =20 QLIST_REMOVE(group, container_next); group->container =3D NULL; @@ -2286,24 +2323,7 @@ static void vfio_disconnect_container(VFIOGroup *gro= up) } =20 if (QLIST_EMPTY(&container->group_list)) { - VFIOAddressSpace *space =3D container->space; - VFIOGuestIOMMU *giommu, *tmp; - VFIOHostDMAWindow *hostwin, *next; - - QLIST_REMOVE(container, next); - - QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, t= mp) { - memory_region_unregister_iommu_notifier( - MEMORY_REGION(giommu->iommu_mr), &giommu->n); - QLIST_REMOVE(giommu, giommu_next); - g_free(giommu); - } - - QLIST_FOREACH_SAFE(hostwin, &container->hostwin_list, hostwin_next, - next) { - QLIST_REMOVE(hostwin, hostwin_next); - g_free(hostwin); - } + vfio_unmap_container(container); =20 trace_vfio_disconnect_container(container->fd); close(container->fd); @@ -2503,7 +2523,9 @@ void vfio_put_base_device(VFIODevice *vbasedev) QLIST_REMOVE(vbasedev, next); vbasedev->group =3D NULL; trace_vfio_put_base_device(vbasedev->fd); - close(vbasedev->fd); + if (vbasedev->fd !=3D -1) { + close(vbasedev->fd); + } } =20 int vfio_get_region_info(VFIODevice *vbasedev, int index, diff --git a/hw/vfio/user-pci.c b/hw/vfio/user-pci.c index bc1d01a..a0aa320 100644 --- a/hw/vfio/user-pci.c +++ b/hw/vfio/user-pci.c @@ -134,6 +134,7 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Erro= r **errp) VFIODevice *vbasedev =3D &vdev->vbasedev; SocketAddress addr; VFIOUserProxy *proxy; + VFIOGroup *group =3D NULL; int ret; Error *err =3D NULL; =20 @@ -180,8 +181,15 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Err= or **errp) vbasedev->use_regfds =3D true; vbasedev->can_mask_irq =3D true; =20 - ret =3D vfio_user_get_device(vbasedev, errp); + group =3D vfio_user_get_group(proxy, pci_device_iommu_address_space(pd= ev), + errp); + if (!group) { + goto error; + } + + ret =3D vfio_user_get_device(group, vbasedev, errp); if (ret) { + vfio_user_put_group(group); goto error; } =20 @@ -241,6 +249,7 @@ static void vfio_user_instance_finalize(Object *obj) { VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(obj); VFIODevice *vbasedev =3D &vdev->vbasedev; + VFIOGroup *group =3D vbasedev->group; =20 vfio_bars_finalize(vdev); g_free(vdev->emulated_config_bits); @@ -251,6 +260,7 @@ static void vfio_user_instance_finalize(Object *obj) } =20 vfio_put_device(vdev); + vfio_user_put_group(group); =20 if (vbasedev->proxy !=3D NULL) { vfio_user_disconnect(vbasedev->proxy); diff --git a/hw/vfio/user.c b/hw/vfio/user.c index d66dc1b..aebf44c 100644 --- a/hw/vfio/user.c +++ b/hw/vfio/user.c @@ -18,10 +18,14 @@ #include "hw/hw.h" #include "hw/vfio/vfio-common.h" #include "hw/vfio/vfio.h" +#include "exec/address-spaces.h" +#include "exec/memory.h" +#include "exec/ram_addr.h" #include "qemu/sockets.h" #include "io/channel.h" #include "io/channel-socket.h" #include "io/channel-util.h" +#include "sysemu/reset.h" #include "sysemu/iothread.h" #include "qapi/qmp/qdict.h" #include "qapi/qmp/qjson.h" @@ -847,7 +851,102 @@ void vfio_user_disconnect(VFIOUserProxy *proxy) g_free(proxy); } =20 -int vfio_user_get_device(VFIODevice *vbasedev, Error **errp) +static int vfio_connect_proxy(VFIOUserProxy *proxy, VFIOGroup *group, + AddressSpace *as, Error **errp) +{ + VFIOAddressSpace *space; + VFIOContainer *container; + int ret; + + /* + * try to mirror vfio_connect_container() + * as much as possible + */ + + space =3D vfio_get_address_space(as); + + container =3D vfio_new_container(space); + container->fd =3D -1; + container->io =3D &vfio_cont_io_sock; + container->proxy =3D proxy; + + /* + * The proxy uses a SW IOMMU in lieu of the HW one + * used in the ioctl() version. Mascarade as TYPE1 + * for maximum compatibility + */ + container->iommu_type =3D VFIO_TYPE1_IOMMU; + + /* + * VFIO user allows the device server to map guest + * memory so it has the same issue with discards as + * a local IOMMU has. + */ + ret =3D vfio_ram_block_discard_disable(container, true); + if (ret) { + error_setg_errno(errp, -ret, "Cannot set discarding of RAM broken"= ); + goto free_container_exit; + } + + vfio_host_win_add(container, 0, (hwaddr)-1, proxy->dma_pgsizes); + container->pgsizes =3D proxy->dma_pgsizes; + container->dma_max_mappings =3D proxy->max_dma; + + /* setup bitmask now, but migration support won't be ready until v2 */ + container->dirty_pages_supported =3D true; + container->max_dirty_bitmap_size =3D proxy->max_bitmap; + container->dirty_pgsizes =3D proxy->migr_pgsize; + + vfio_link_container(container, group); + + if (container->error) { + ret =3D -1; + error_propagate_prepend(errp, container->error, + "memory listener initialization failed: "); + goto listener_release_exit; + } + + container->initialized =3D true; + + return 0; + +listener_release_exit: + QLIST_REMOVE(group, container_next); + QLIST_REMOVE(container, next); + vfio_listener_release(container); + vfio_ram_block_discard_disable(container, false); + +free_container_exit: + g_free(container); + + vfio_put_address_space(space); + + return ret; +} + +static void vfio_disconnect_proxy(VFIOGroup *group) +{ + VFIOContainer *container =3D group->container; + VFIOAddressSpace *space =3D container->space; + + /* + * try to mirror vfio_disconnect_container() + * as much as possible, knowing each device + * is in one group and one container + */ + + QLIST_REMOVE(group, container_next); + group->container =3D NULL; + + memory_listener_unregister(&container->listener); + + vfio_unmap_container(container); + + g_free(container); + vfio_put_address_space(space); +} + +int vfio_user_get_device(VFIOGroup *group, VFIODevice *vbasedev, Error **e= rrp) { struct vfio_device_info info =3D { .argsz =3D sizeof(info) }; int ret; @@ -866,11 +965,57 @@ int vfio_user_get_device(VFIODevice *vbasedev, Error = **errp) } =20 vbasedev->fd =3D -1; - vfio_init_device(vbasedev, NULL, &info); + vfio_init_device(vbasedev, group, &info); =20 return 0; } =20 +VFIOGroup *vfio_user_get_group(VFIOUserProxy *proxy, AddressSpace *as, + Error **errp) +{ + VFIOGroup *group; + + /* + * Mirror vfio_get_group(), except that each + * device gets its own group and container, + * unrelated to any host IOMMU groupings + */ + group =3D g_malloc0(sizeof(*group)); + group->fd =3D -1; + group->groupid =3D -1; + QLIST_INIT(&group->device_list); + + if (vfio_connect_proxy(proxy, group, as, errp)) { + error_prepend(errp, "failed to connect proxy"); + g_free(group); + group =3D NULL; + } + + if (QLIST_EMPTY(&vfio_group_list)) { + qemu_register_reset(vfio_reset_handler, NULL); + } + + QLIST_INSERT_HEAD(&vfio_group_list, group, next); + + return group; +} + +void vfio_user_put_group(VFIOGroup *group) +{ + if (!group || !QLIST_EMPTY(&group->device_list)) { + return; + } + + vfio_ram_block_discard_disable(group->container, false); + vfio_disconnect_proxy(group); + QLIST_REMOVE(group, next); + g_free(group); + + if (QLIST_EMPTY(&vfio_group_list)) { + qemu_unregister_reset(vfio_reset_handler, NULL); + } +} + static void vfio_user_request_msg(VFIOUserHdr *hdr, uint16_t cmd, uint32_t size, uint32_t flags) { @@ -1463,3 +1608,6 @@ VFIODeviceIO vfio_dev_io_sock =3D { .region_read =3D vfio_user_io_region_read, .region_write =3D vfio_user_io_region_write, }; + +VFIOContainerIO vfio_cont_io_sock =3D { +}; --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675317065; cv=none; d=zohomail.com; s=zohoarc; b=FEZ78mRtXp/E4JNJ3Q8F1/cluyOrc73Kx4/b+xXY2tSZJOcTy/LoVpVItKhvoR7U9Op86CPNcpjuxFHptRVabUkiXrF3ZMIxZCBSTnLF9VfAUpMNfrSXxDdUN9UYxI4PdnvMoMJoFh/qUmDCSF9PpD7S3x70ishL06g3eYJcjDo= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675317065; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=w/VSt1VAcFvKnFOtKGDI3kvOUkpyniXOkU7ZId3DcC0=; b=lC1B/awFkUTY/uiJ4CTjBE0dNYZm+o0URS60PkavbBrchVsdLjhAwF+J2wZOBndpXG+gr3wb/TVJbY5JkKqNZmVsXd5/tiPuCB3JhUkrGrLylfsdO7BAQ0AAOwHxpFOejmcF/GKbtQiE9XaFY656qwIg39KKvSm0Xy8AbDvbtZA= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1675317065237172.75932052082067; Wed, 1 Feb 2023 21:51:05 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSPO-0008Cz-7w; Thu, 02 Feb 2023 00:45:18 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPM-0008C2-PP for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:16 -0500 Received: from mx0a-00069f02.pphosted.com ([205.220.165.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPK-0007Br-6q for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:16 -0500 Received: from pps.filterd (m0246627.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124j1mt027569; Thu, 2 Feb 2023 05:45:12 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfpywj0w3-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:45:11 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254Ppp013015; Thu, 2 Feb 2023 05:45:10 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-18 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:45:10 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=w/VSt1VAcFvKnFOtKGDI3kvOUkpyniXOkU7ZId3DcC0=; b=baXnC1v8bW0HCwd+EeyeDmcrm7RmxDFQpVp0icHUqU/k/m6xB68CqusfSUw99jq7FIY7 MeP+/C0T4vLDltSOAxvpWFzacKx7ibDSAG3zEe97S0EMg5pbrv/p/yNdKuB3YzgM5ZQ3 0y/0ScRzq6OyDMhWBIBp4cOcCZKZgG9o2aoIuKLBQNqhQi/LH2crOLFerr7pSXljCmp6 fFDYsdD/XCMb3ZFIE/zGeze4m9mcYzI9u7VMcMuUU8VXV59XSekyyI2AUcIrmV0K7LFU 3OxoLQO4z658LFcL1nQwR5TkT4EXaTH2nbVt5IQGFA1DhlnJFUOBf7ry0XNmN3JH2G+b vQ== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 17/23] vfio-user: dma map/unmap operations Date: Wed, 1 Feb 2023 21:55:53 -0800 Message-Id: <1ec25a5832299083fee3c90bd89561f5c1d42ba9.1675228037.git.john.g.johnson@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-GUID: WwtUmMwLk0QceoY-8RdtDSPnuJrVFFwv X-Proofpoint-ORIG-GUID: WwtUmMwLk0QceoY-8RdtDSPnuJrVFFwv Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.165.32; envelope-from=john.g.johnson@oracle.com; helo=mx0a-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675317066680100012 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Add ability to do async operations during memory transactions Signed-off-by: Jagannathan Raman Signed-off-by: Elena Ufimtseva Signed-off-by: John G Johnson --- hw/vfio/user-protocol.h | 32 ++++++ include/hw/vfio/vfio-common.h | 4 +- hw/vfio/common.c | 64 +++++++++--- hw/vfio/user.c | 224 ++++++++++++++++++++++++++++++++++++++= ++++ hw/vfio/trace-events | 2 + 5 files changed, 311 insertions(+), 15 deletions(-) diff --git a/hw/vfio/user-protocol.h b/hw/vfio/user-protocol.h index 48dd475..109076d 100644 --- a/hw/vfio/user-protocol.h +++ b/hw/vfio/user-protocol.h @@ -114,6 +114,31 @@ typedef struct { #define VFIO_USER_DEF_MAX_BITMAP (256 * 1024 * 1024) =20 /* + * VFIO_USER_DMA_MAP + * imported from struct vfio_iommu_type1_dma_map + */ +typedef struct { + VFIOUserHdr hdr; + uint32_t argsz; + uint32_t flags; + uint64_t offset; /* FD offset */ + uint64_t iova; + uint64_t size; +} VFIOUserDMAMap; + +/* + * VFIO_USER_DMA_UNMAP + * imported from struct vfio_iommu_type1_dma_unmap + */ +typedef struct { + VFIOUserHdr hdr; + uint32_t argsz; + uint32_t flags; + uint64_t iova; + uint64_t size; +} VFIOUserDMAUnmap; + +/* * VFIO_USER_DEVICE_GET_INFO * imported from struct vfio_device_info */ @@ -176,4 +201,11 @@ typedef struct { char data[]; } VFIOUserRegionRW; =20 +/*imported from struct vfio_bitmap */ +typedef struct { + uint64_t pgsize; + uint64_t size; + char data[]; +} VFIOUserBitmap; + #endif /* VFIO_USER_PROTOCOL_H */ diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index b0c4453..ee6ad8f 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -90,6 +90,7 @@ typedef struct VFIOContainer { VFIOContainerIO *io; bool initialized; bool dirty_pages_supported; + bool async_ops; uint64_t dirty_pgsizes; uint64_t max_dirty_bitmap_size; unsigned long pgsizes; @@ -187,7 +188,7 @@ struct VFIODeviceIO { }; =20 struct VFIOContainerIO { - int (*dma_map)(VFIOContainer *container, + int (*dma_map)(VFIOContainer *container, MemoryRegion *mr, struct vfio_iommu_type1_dma_map *map); int (*dma_unmap)(VFIOContainer *container, struct vfio_iommu_type1_dma_unmap *unmap, @@ -195,6 +196,7 @@ struct VFIOContainerIO { int (*dirty_bitmap)(VFIOContainer *container, struct vfio_iommu_type1_dirty_bitmap *bitmap, struct vfio_iommu_type1_dirty_bitmap_get *range); + void (*wait_commit)(VFIOContainer *container); }; =20 extern VFIODeviceIO vfio_dev_io_ioctl; diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 6f99907..f04fd20 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -508,7 +508,7 @@ static int vfio_dma_unmap(VFIOContainer *container, return container->io->dma_unmap(container, &unmap, NULL); } =20 -static int vfio_dma_map(VFIOContainer *container, hwaddr iova, +static int vfio_dma_map(VFIOContainer *container, MemoryRegion *mr, hwaddr= iova, ram_addr_t size, void *vaddr, bool readonly) { struct vfio_iommu_type1_dma_map map =3D { @@ -524,8 +524,7 @@ static int vfio_dma_map(VFIOContainer *container, hwadd= r iova, map.flags |=3D VFIO_DMA_MAP_FLAG_WRITE; } =20 - ret =3D container->io->dma_map(container, &map); - + ret =3D container->io->dma_map(container, mr, &map); if (ret < 0) { error_report("VFIO_MAP_DMA failed: %s", strerror(-ret)); } @@ -587,7 +586,8 @@ static bool vfio_listener_skipped_section(MemoryRegionS= ection *section) =20 /* Called with rcu_read_lock held. */ static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr, - ram_addr_t *ram_addr, bool *read_only) + ram_addr_t *ram_addr, bool *read_only, + MemoryRegion **mrp) { MemoryRegion *mr; hwaddr xlat; @@ -668,6 +668,10 @@ static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, v= oid **vaddr, *read_only =3D !writable || mr->readonly; } =20 + if (mrp !=3D NULL) { + *mrp =3D mr; + } + return true; } =20 @@ -675,6 +679,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOM= MUTLBEntry *iotlb) { VFIOGuestIOMMU *giommu =3D container_of(n, VFIOGuestIOMMU, n); VFIOContainer *container =3D giommu->container; + MemoryRegion *mr; hwaddr iova =3D iotlb->iova + giommu->iommu_offset; void *vaddr; int ret; @@ -693,7 +698,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOM= MUTLBEntry *iotlb) if ((iotlb->perm & IOMMU_RW) !=3D IOMMU_NONE) { bool read_only; =20 - if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only)) { + if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, &mr)) { goto out; } /* @@ -703,14 +708,14 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, I= OMMUTLBEntry *iotlb) * of vaddr will always be there, even if the memory object is * destroyed and its backing memory munmap-ed. */ - ret =3D vfio_dma_map(container, iova, + ret =3D vfio_dma_map(container, mr, iova, iotlb->addr_mask + 1, vaddr, read_only); if (ret) { error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", " - "0x%"HWADDR_PRIx", %p) =3D %d (%m)", + "0x%"HWADDR_PRIx", %p)", container, iova, - iotlb->addr_mask + 1, vaddr, ret); + iotlb->addr_mask + 1, vaddr); } } else { ret =3D vfio_dma_unmap(container, iova, iotlb->addr_mask + 1, iotl= b); @@ -765,7 +770,7 @@ static int vfio_ram_discard_notify_populate(RamDiscardL= istener *rdl, section->offset_within_address_space; vaddr =3D memory_region_get_ram_ptr(section->mr) + start; =20 - ret =3D vfio_dma_map(vrdl->container, iova, next - start, + ret =3D vfio_dma_map(vrdl->container, section->mr, iova, next - st= art, vaddr, section->readonly); if (ret) { /* Rollback */ @@ -889,6 +894,29 @@ static bool vfio_known_safe_misalignment(MemoryRegionS= ection *section) return true; } =20 +static void vfio_listener_begin(MemoryListener *listener) +{ + VFIOContainer *container =3D container_of(listener, VFIOContainer, lis= tener); + + /* + * When DMA space is the physical address space, + * the region add/del listeners will fire during + * memory update transactions. These depend on BQL + * being held, so do any resulting map/demap ops async + * while keeping BQL. + */ + container->async_ops =3D true; +} + +static void vfio_listener_commit(MemoryListener *listener) +{ + VFIOContainer *container =3D container_of(listener, VFIOContainer, lis= tener); + + /* wait here for any async requests sent during the transaction */ + container->io->wait_commit(container); + container->async_ops =3D false; +} + static void vfio_listener_region_add(MemoryListener *listener, MemoryRegionSection *section) { @@ -1096,12 +1124,12 @@ static void vfio_listener_region_add(MemoryListener= *listener, } } =20 - ret =3D vfio_dma_map(container, iova, int128_get64(llsize), + ret =3D vfio_dma_map(container, section->mr, iova, int128_get64(llsize= ), vaddr, section->readonly); if (ret) { error_setg(&err, "vfio_dma_map(%p, 0x%"HWADDR_PRIx", " - "0x%"HWADDR_PRIx", %p) =3D %d (%m)", - container, iova, int128_get64(llsize), vaddr, ret); + "0x%"HWADDR_PRIx", %p)", + container, iova, int128_get64(llsize), vaddr); if (memory_region_is_ram_device(section->mr)) { /* Allow unexpected mappings not to be fatal for RAM devices */ error_report_err(err); @@ -1370,7 +1398,7 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier= *n, IOMMUTLBEntry *iotlb) } =20 rcu_read_lock(); - if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) { + if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL, NULL)) { int ret; =20 ret =3D vfio_get_dirty_bitmap(container, iova, iotlb->addr_mask + = 1, @@ -1488,6 +1516,8 @@ static void vfio_listener_log_sync(MemoryListener *li= stener, =20 static const MemoryListener vfio_memory_listener =3D { .name =3D "vfio", + .begin =3D vfio_listener_begin, + .commit =3D vfio_listener_commit, .region_add =3D vfio_listener_region_add, .region_del =3D vfio_listener_region_del, .log_global_start =3D vfio_listener_log_global_start, @@ -2788,7 +2818,7 @@ VFIODeviceIO vfio_dev_io_ioctl =3D { .region_write =3D vfio_io_region_write, }; =20 -static int vfio_io_dma_map(VFIOContainer *container, +static int vfio_io_dma_map(VFIOContainer *container, MemoryRegion *mr, struct vfio_iommu_type1_dma_map *map) { =20 @@ -2848,8 +2878,14 @@ static int vfio_io_dirty_bitmap(VFIOContainer *conta= iner, return ret < 0 ? -errno : ret; } =20 +static void vfio_io_wait_commit(VFIOContainer *container) +{ + /* ioctl()s are synchronous */ +} + static VFIOContainerIO vfio_cont_io_ioctl =3D { .dma_map =3D vfio_io_dma_map, .dma_unmap =3D vfio_io_dma_unmap, .dirty_bitmap =3D vfio_io_dirty_bitmap, + .wait_commit =3D vfio_io_wait_commit, }; diff --git a/hw/vfio/user.c b/hw/vfio/user.c index aebf44c..6dee775 100644 --- a/hw/vfio/user.c +++ b/hw/vfio/user.c @@ -64,8 +64,11 @@ static void vfio_user_request(void *opaque); static int vfio_user_send_queued(VFIOUserProxy *proxy, VFIOUserMsg *msg); static void vfio_user_send_async(VFIOUserProxy *proxy, VFIOUserHdr *hdr, VFIOUserFDs *fds); +static void vfio_user_send_nowait(VFIOUserProxy *proxy, VFIOUserHdr *hdr, + VFIOUserFDs *fds, int rsize); static void vfio_user_send_wait(VFIOUserProxy *proxy, VFIOUserHdr *hdr, VFIOUserFDs *fds, int rsize, bool nobql); +static void vfio_user_wait_reqs(VFIOUserProxy *proxy); static void vfio_user_request_msg(VFIOUserHdr *hdr, uint16_t cmd, uint32_t size, uint32_t flags); =20 @@ -664,6 +667,36 @@ static void vfio_user_send_async(VFIOUserProxy *proxy,= VFIOUserHdr *hdr, } } =20 +/* + * nowait send - vfio_wait_reqs() can wait for it later + */ +static void vfio_user_send_nowait(VFIOUserProxy *proxy, VFIOUserHdr *hdr, + VFIOUserFDs *fds, int rsize) +{ + VFIOUserMsg *msg; + int ret; + + if (hdr->flags & VFIO_USER_NO_REPLY) { + error_printf("vfio_user_send_nowait on async message\n"); + return; + } + + QEMU_LOCK_GUARD(&proxy->lock); + + msg =3D vfio_user_getmsg(proxy, hdr, fds); + msg->id =3D hdr->id; + msg->rsize =3D rsize ? rsize : hdr->size; + msg->type =3D VFIO_MSG_NOWAIT; + + ret =3D vfio_user_send_queued(proxy, msg); + if (ret < 0) { + vfio_user_recycle(proxy, msg); + return; + } + + proxy->last_nowait =3D msg; +} + static void vfio_user_send_wait(VFIOUserProxy *proxy, VFIOUserHdr *hdr, VFIOUserFDs *fds, int rsize, bool nobql) { @@ -717,6 +750,60 @@ static void vfio_user_send_wait(VFIOUserProxy *proxy, = VFIOUserHdr *hdr, } } =20 +static void vfio_user_wait_reqs(VFIOUserProxy *proxy) +{ + VFIOUserMsg *msg; + bool iolock =3D false; + + /* + * Any DMA map/unmap requests sent in the middle + * of a memory region transaction were sent nowait. + * Wait for them here. + */ + qemu_mutex_lock(&proxy->lock); + if (proxy->last_nowait !=3D NULL) { + iolock =3D qemu_mutex_iothread_locked(); + if (iolock) { + qemu_mutex_unlock_iothread(); + } + + /* + * Change type to WAIT to wait for reply + */ + msg =3D proxy->last_nowait; + msg->type =3D VFIO_MSG_WAIT; + proxy->last_nowait =3D NULL; + while (!msg->complete) { + if (!qemu_cond_timedwait(&msg->cv, &proxy->lock, wait_time)) { + VFIOUserMsgQ *list; + + list =3D msg->pending ? &proxy->pending : &proxy->outgoing; + QTAILQ_REMOVE(list, msg, next); + error_printf("vfio_wait_reqs - timed out\n"); + break; + } + } + + if (msg->hdr->flags & VFIO_USER_ERROR) { + error_printf("vfio_user_wait_reqs - error reply on async "); + error_printf("request: command %x error %s\n", msg->hdr->comma= nd, + strerror(msg->hdr->error_reply)); + } + + /* + * Change type back to NOWAIT to free + */ + msg->type =3D VFIO_MSG_NOWAIT; + vfio_user_recycle(proxy, msg); + } + + /* lock order is BQL->proxy - don't hold proxy when getting BQL */ + qemu_mutex_unlock(&proxy->lock); + if (iolock) { + qemu_mutex_lock_iothread(); + } +} + static QLIST_HEAD(, VFIOUserProxy) vfio_user_sockets =3D QLIST_HEAD_INITIALIZER(vfio_user_sockets); =20 @@ -1298,6 +1385,107 @@ int vfio_user_validate_version(VFIOUserProxy *proxy= , Error **errp) return 0; } =20 +static int vfio_user_dma_map(VFIOUserProxy *proxy, + struct vfio_iommu_type1_dma_map *map, + int fd, bool will_commit) +{ + VFIOUserFDs *fds =3D NULL; + VFIOUserDMAMap *msgp =3D g_malloc0(sizeof(*msgp)); + int ret; + + vfio_user_request_msg(&msgp->hdr, VFIO_USER_DMA_MAP, sizeof(*msgp), 0); + msgp->argsz =3D map->argsz; + msgp->flags =3D map->flags; + msgp->offset =3D map->vaddr; + msgp->iova =3D map->iova; + msgp->size =3D map->size; + trace_vfio_user_dma_map(msgp->iova, msgp->size, msgp->offset, msgp->fl= ags, + will_commit); + + /* + * The will_commit case sends without blocking or dropping BQL. + * They're later waited for in vfio_send_wait_reqs. + */ + if (will_commit) { + /* can't use auto variable since we don't block */ + if (fd !=3D -1) { + fds =3D vfio_user_getfds(1); + fds->send_fds =3D 1; + fds->fds[0] =3D fd; + } + vfio_user_send_nowait(proxy, &msgp->hdr, fds, 0); + ret =3D 0; + } else { + VFIOUserFDs local_fds =3D { 1, 0, &fd }; + + fds =3D fd !=3D -1 ? &local_fds : NULL; + vfio_user_send_wait(proxy, &msgp->hdr, fds, 0, will_commit); + ret =3D (msgp->hdr.flags & VFIO_USER_ERROR) ? -msgp->hdr.error_rep= ly : 0; + g_free(msgp); + } + + return ret; +} + +static int vfio_user_dma_unmap(VFIOUserProxy *proxy, + struct vfio_iommu_type1_dma_unmap *unmap, + struct vfio_bitmap *bitmap, bool will_commi= t) +{ + struct { + VFIOUserDMAUnmap msg; + VFIOUserBitmap bitmap; + } *msgp =3D NULL; + int msize, rsize; + bool blocking =3D !will_commit; + + if (bitmap =3D=3D NULL && + (unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP)) { + error_printf("vfio_user_dma_unmap mismatched flags and bitmap\n"); + return -EINVAL; + } + + /* + * If a dirty bitmap is returned, allocate extra space for it + * and block for reply even in the will_commit case. + * Otherwise, can send the unmap request without waiting. + */ + if (bitmap !=3D NULL) { + blocking =3D true; + msize =3D sizeof(*msgp); + rsize =3D msize + bitmap->size; + msgp =3D g_malloc0(rsize); + msgp->bitmap.pgsize =3D bitmap->pgsize; + msgp->bitmap.size =3D bitmap->size; + } else { + msize =3D rsize =3D sizeof(VFIOUserDMAUnmap); + msgp =3D g_malloc0(rsize); + } + + vfio_user_request_msg(&msgp->msg.hdr, VFIO_USER_DMA_UNMAP, msize, 0); + msgp->msg.argsz =3D rsize - sizeof(VFIOUserHdr); + msgp->msg.argsz =3D unmap->argsz; + msgp->msg.flags =3D unmap->flags; + msgp->msg.iova =3D unmap->iova; + msgp->msg.size =3D unmap->size; + trace_vfio_user_dma_unmap(msgp->msg.iova, msgp->msg.size, msgp->msg.fl= ags, + bitmap !=3D NULL, will_commit); + + if (blocking) { + vfio_user_send_wait(proxy, &msgp->msg.hdr, NULL, rsize, will_commi= t); + if (msgp->msg.hdr.flags & VFIO_USER_ERROR) { + return -msgp->msg.hdr.error_reply; + } + if (bitmap !=3D NULL) { + memcpy(bitmap->data, &msgp->bitmap.data, bitmap->size); + } + g_free(msgp); + } else { + vfio_user_send_nowait(proxy, &msgp->msg.hdr, NULL, rsize); + } + + return 0; +} + static int vfio_user_get_info(VFIOUserProxy *proxy, struct vfio_device_info *info) { @@ -1609,5 +1797,41 @@ VFIODeviceIO vfio_dev_io_sock =3D { .region_write =3D vfio_user_io_region_write, }; =20 +static int vfio_user_io_dma_map(VFIOContainer *container, MemoryRegion *mr, + struct vfio_iommu_type1_dma_map *map) +{ + int fd =3D memory_region_get_fd(mr); + + /* + * map->vaddr enters as a QEMU process address + * make it either a file offset for mapped areas or 0 + */ + if (fd !=3D -1) { + void *addr =3D (void *)(uintptr_t)map->vaddr; + + map->vaddr =3D qemu_ram_block_host_offset(mr->ram_block, addr); + } else { + map->vaddr =3D 0; + } + + return vfio_user_dma_map(container->proxy, map, fd, container->async_o= ps); +} + +static int vfio_user_io_dma_unmap(VFIOContainer *container, + struct vfio_iommu_type1_dma_unmap *unmap, + struct vfio_bitmap *bitmap) +{ + return vfio_user_dma_unmap(container->proxy, unmap, bitmap, + container->async_ops); +} + +static void vfio_user_io_wait_commit(VFIOContainer *container) +{ + vfio_user_wait_reqs(container->proxy); +} + VFIOContainerIO vfio_cont_io_sock =3D { + .dma_map =3D vfio_user_io_dma_map, + .dma_unmap =3D vfio_user_io_dma_unmap, + .wait_commit =3D vfio_user_io_wait_commit, }; diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index 01563cb..a4e02ff 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -178,3 +178,5 @@ vfio_user_get_region_info(uint32_t index, uint32_t flag= s, uint64_t size) " index vfio_user_region_rw(uint32_t region, uint64_t off, uint32_t count) " regio= n %d offset 0x%"PRIx64" count %d" vfio_user_get_irq_info(uint32_t index, uint32_t flags, uint32_t count) " i= ndex %d flags 0x%x count %d" vfio_user_set_irqs(uint32_t index, uint32_t start, uint32_t count, uint32_= t flags) " index %d start %d count %d flags 0x%x" +vfio_user_dma_map(uint64_t iova, uint64_t size, uint64_t off, uint32_t fla= gs, bool will_commit) " iova 0x%"PRIx64" size 0x%"PRIx64" off 0x%"PRIx64" f= lags 0x%x will_commit %d" +vfio_user_dma_unmap(uint64_t iova, uint64_t size, uint32_t flags, bool dir= ty, bool will_commit) " iova 0x%"PRIx64" size 0x%"PRIx64" flags 0x%x dirty = %d will_commit %d" --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675317061; cv=none; d=zohomail.com; s=zohoarc; b=DjQ+XSRoyEdzTDA2iUN+1CWRYTsVwCQyM2/xu8Jv5Zt4uT9c7AFqxxJUwLUaE6Efe2F+pxS50/m3x7QWTCBpnw6rtOO+05RxjuZ7QnJTfyHZjAbPDdp6bjt7IlWPc6rX/fvdBlb5K+6IQ+MGzrw3jTjDTrMWaCSu5/TG/HpTobg= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675317061; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=FSwq4E8XT0q1Pw7iQv4siD+cV8OFk9AaZWjCbzQfvAA=; b=ZAs7BJ+St19MhbLz6+9p9EkGgK6m9zXo/VtZn6MWDUZRdu4GjrKK0rKfY88nhh/0Z6F1P0ZCd0p49E7USFWnzpfZG0qdlWSuVURFaDW/b2cKYJY66AqghdGvb2RGfc5bGEMIkWikgiIfIkrLi15TXT1G6m72GZM8SP1Sfn5fy/o= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1675317061595192.4431247377206; Wed, 1 Feb 2023 21:51:01 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSPO-0008D9-I3; Thu, 02 Feb 2023 00:45:18 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPM-0008C9-WE for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:17 -0500 Received: from mx0a-00069f02.pphosted.com ([205.220.165.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPL-0007CB-6e for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:16 -0500 Received: from pps.filterd (m0333521.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124i1H4021554; Thu, 2 Feb 2023 05:45:13 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfq28syxv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:45:12 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254Ppr013015; Thu, 2 Feb 2023 05:45:11 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-19 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:45:11 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=FSwq4E8XT0q1Pw7iQv4siD+cV8OFk9AaZWjCbzQfvAA=; b=zR6tafSGWtH3jO74y7RFK19Irbifs5Bg5qfS5IknpyPXW6WF/pCSw9F7yM+U1PZNElaO 4wG2pQ8xckZsFGgpgQrQtjXJZHhEIgii0Ud7d1CAqUUPBRG3TiIYikxv++rReTX9KemW DvhsWXAOfMcX4ybKP4e2ANCEj4Ma1qdDSZ7pdiC0+KlGtpCD4nzF2a3334G9EA7OHYOO ykYDzXYcp4yrARwaUW1fvZPyL95sCBOB1tE7KiBJ4BVa2c6ZplY9uyEbyt+VQHcyNxY+ GadgdHnvWLMOcHdcr7jTjLvdNGv/5FclN1675Sphcj/Cj/379dZpVkkN4wKgqHVZZm2A vA== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 18/23] vfio-user: add dma_unmap_all Date: Wed, 1 Feb 2023 21:55:54 -0800 Message-Id: <20fc8b4bb94583ef41d289db3831a9d07a0eae02.1675228037.git.john.g.johnson@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-GUID: wM1LE-PLgSFEfoAQ_mxyREa04wCFcu2T X-Proofpoint-ORIG-GUID: wM1LE-PLgSFEfoAQ_mxyREa04wCFcu2T Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.165.32; envelope-from=john.g.johnson@oracle.com; helo=mx0a-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675317062693100001 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Signed-off-by: John G Johnson Signed-off-by: Elena Ufimtseva Signed-off-by: Jagannathan Raman --- include/hw/vfio/vfio-common.h | 1 + hw/vfio/common.c | 45 ++++++++++++++++++++++++++++++++++-----= ---- hw/vfio/user.c | 24 +++++++++++++++++++++++ 3 files changed, 61 insertions(+), 9 deletions(-) diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h index ee6ad8f..abef9b4 100644 --- a/include/hw/vfio/vfio-common.h +++ b/include/hw/vfio/vfio-common.h @@ -193,6 +193,7 @@ struct VFIOContainerIO { int (*dma_unmap)(VFIOContainer *container, struct vfio_iommu_type1_dma_unmap *unmap, struct vfio_bitmap *bitmap); + int (*dma_unmap_all)(VFIOContainer *container, uint32_t flags); int (*dirty_bitmap)(VFIOContainer *container, struct vfio_iommu_type1_dirty_bitmap *bitmap, struct vfio_iommu_type1_dirty_bitmap_get *range); diff --git a/hw/vfio/common.c b/hw/vfio/common.c index f04fd20..8b55fbb 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -508,6 +508,14 @@ static int vfio_dma_unmap(VFIOContainer *container, return container->io->dma_unmap(container, &unmap, NULL); } =20 +/* + * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86 + */ +static int vfio_dma_unmap_all(VFIOContainer *container) +{ + return container->io->dma_unmap_all(container, VFIO_DMA_UNMAP_FLAG_ALL= ); +} + static int vfio_dma_map(VFIOContainer *container, MemoryRegion *mr, hwaddr= iova, ram_addr_t size, void *vaddr, bool readonly) { @@ -1256,17 +1264,10 @@ static void vfio_listener_region_del(MemoryListener= *listener, =20 if (try_unmap) { if (int128_eq(llsize, int128_2_64())) { - /* The unmap ioctl doesn't accept a full 64-bit span. */ - llsize =3D int128_rshift(llsize, 1); + ret =3D vfio_dma_unmap_all(container); + } else { ret =3D vfio_dma_unmap(container, iova, int128_get64(llsize), = NULL); - if (ret) { - error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", " - "0x%"HWADDR_PRIx") =3D %d (%m)", - container, iova, int128_get64(llsize), ret); - } - iova +=3D int128_get64(llsize); } - ret =3D vfio_dma_unmap(container, iova, int128_get64(llsize), NULL= ); if (ret) { error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", " "0x%"HWADDR_PRIx") =3D %d (%m)", @@ -2867,6 +2868,31 @@ static int vfio_io_dma_unmap(VFIOContainer *containe= r, return 0; } =20 +static int vfio_io_dma_unmap_all(VFIOContainer *container, uint32_t flags) +{ + struct vfio_iommu_type1_dma_unmap unmap =3D { + .argsz =3D sizeof(unmap), + .flags =3D 0, + .size =3D 0x8000000000000000, + }; + int ret; + + /* The unmap ioctl doesn't accept a full 64-bit span. */ + unmap.iova =3D 0; + ret =3D ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap); + if (ret) { + return -errno; + } + + unmap.iova +=3D unmap.size; + ret =3D ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap); + if (ret) { + return -errno; + } + + return 0; +} + static int vfio_io_dirty_bitmap(VFIOContainer *container, struct vfio_iommu_type1_dirty_bitmap *bitm= ap, struct vfio_iommu_type1_dirty_bitmap_get *= range) @@ -2886,6 +2912,7 @@ static void vfio_io_wait_commit(VFIOContainer *contai= ner) static VFIOContainerIO vfio_cont_io_ioctl =3D { .dma_map =3D vfio_io_dma_map, .dma_unmap =3D vfio_io_dma_unmap, + .dma_unmap_all =3D vfio_io_dma_unmap_all, .dirty_bitmap =3D vfio_io_dirty_bitmap, .wait_commit =3D vfio_io_wait_commit, }; diff --git a/hw/vfio/user.c b/hw/vfio/user.c index 6dee775..fe6e476 100644 --- a/hw/vfio/user.c +++ b/hw/vfio/user.c @@ -1825,6 +1825,28 @@ static int vfio_user_io_dma_unmap(VFIOContainer *con= tainer, container->async_ops); } =20 +static int vfio_user_io_dma_unmap_all(VFIOContainer *container, uint32_t f= lags) +{ + struct vfio_iommu_type1_dma_unmap unmap =3D { + .argsz =3D sizeof(unmap), + .flags =3D flags | VFIO_DMA_UNMAP_FLAG_ALL, + .iova =3D 0, + .size =3D 0, + }; + + return vfio_user_dma_unmap(container->proxy, &unmap, NULL, + container->async_ops); +} + +static int vfio_user_io_dirty_bitmap(VFIOContainer *container, + struct vfio_iommu_type1_dirty_bitmap *bitmap, + struct vfio_iommu_type1_dirty_bitmap_get *range) +{ + + /* vfio-user doesn't support migration */ + return -EINVAL; +} + static void vfio_user_io_wait_commit(VFIOContainer *container) { vfio_user_wait_reqs(container->proxy); @@ -1833,5 +1855,7 @@ static void vfio_user_io_wait_commit(VFIOContainer *c= ontainer) VFIOContainerIO vfio_cont_io_sock =3D { .dma_map =3D vfio_user_io_dma_map, .dma_unmap =3D vfio_user_io_dma_unmap, + .dma_unmap_all =3D vfio_user_io_dma_unmap_all, + .dirty_bitmap =3D vfio_user_io_dirty_bitmap, .wait_commit =3D vfio_user_io_wait_commit, }; --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675317060; cv=none; d=zohomail.com; s=zohoarc; b=k9TZ+J/swxC/pqo7QiXCEY3AM4eqqVCrsXuMIn2DfkPK7iTa7cyM01fcM6eNjxLydV3wCy/wWlHyullLmkUTttAhDtIno/bQ4zSzwSE0L34YLBs16CWADIvy0glYVUTUt75kMx9pf8doDrisFbzcOBgSxjp3G171B0MmGf7HNM0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675317060; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=FXkLIaylQSY+fsgEo66NE8CdytHAaTrGU2IVfBvGFTA=; b=liZYM92PT7W7Qn/acyxPQWmKWxAG/FaLpQNgP5lrrbBqHVV/a/V6hAN7Dprecnvb9IYogoPDpYclBCq9czpgz0fJ//ojaW7tewPvb//Y88JJhk2/GeItsgRaEP2k22g1baDh2PGRdvPily3Eo8IAa3NXL7iJxmhpo0cG0hBlKIg= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1675317060882156.36113858823342; Wed, 1 Feb 2023 21:51:00 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSPZ-0008Gz-2D; Thu, 02 Feb 2023 00:45:29 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPX-0008Gm-O6 for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:27 -0500 Received: from mx0a-00069f02.pphosted.com ([205.220.165.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPW-0007CK-2Q for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:27 -0500 Received: from pps.filterd (m0246617.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124hvch023776; Thu, 2 Feb 2023 05:45:14 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfkd1tfda-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:45:13 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254Ppt013015; Thu, 2 Feb 2023 05:45:12 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-20 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:45:12 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=FXkLIaylQSY+fsgEo66NE8CdytHAaTrGU2IVfBvGFTA=; b=p/5N4NlXXNSFXXfGdycL1XRgO5tg7rv7wQGUrT+MEVHouS+0tLyUI6bOZLVwFTFQUK4g Hu4MjfTvT7sQrg5lW6ZQlk2MvVs2hj42pLYYHySEgHzjsLjafisKhvsDLKHGLoBStQWZ 9VJoU/G/zuHhJDPWU9UDMfkVzUFuK7UfV2yRADhCi58a/hG+u0QcZzfW2521uegHdw1G TLHXBym81bijHbCwbs7YO1+I2y0Ibbn28XaL3pBFGeKhxWU0UD3VE2v4MMNO35XCMO6B 4OUxRmV0h4eXSySNSPdVtxwSw3utZ/ChoVLaCLbwEMSU2a+RzOUs47Js0HxF6rPnV9RW 7A== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 19/23] vfio-user: no-mmap DMA support Date: Wed, 1 Feb 2023 21:55:55 -0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-GUID: yPNTGllkWCm0OS6uZ5gxUgN__ANGEEBJ X-Proofpoint-ORIG-GUID: yPNTGllkWCm0OS6uZ5gxUgN__ANGEEBJ Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.165.32; envelope-from=john.g.johnson@oracle.com; helo=mx0a-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675317062728100002 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Force remote process to use DMA r/w messages instead of directly mapping guest memory. Signed-off-by: John G Johnson Signed-off-by: Elena Ufimtseva Signed-off-by: Jagannathan Raman --- hw/vfio/user.h | 1 + hw/vfio/user-pci.c | 5 +++++ hw/vfio/user.c | 2 +- 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/hw/vfio/user.h b/hw/vfio/user.h index b89e5ca..fe0115b 100644 --- a/hw/vfio/user.h +++ b/hw/vfio/user.h @@ -83,6 +83,7 @@ typedef struct VFIOUserProxy { =20 /* VFIOProxy flags */ #define VFIO_PROXY_CLIENT 0x1 +#define VFIO_PROXY_NO_MMAP 0x2 #define VFIO_PROXY_FORCE_QUEUED 0x4 #define VFIO_PROXY_NO_POST 0x8 =20 diff --git a/hw/vfio/user-pci.c b/hw/vfio/user-pci.c index a0aa320..bf84d7c 100644 --- a/hw/vfio/user-pci.c +++ b/hw/vfio/user-pci.c @@ -40,6 +40,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(VFIOUserPCIDevice, VFIO_USER_P= CI) struct VFIOUserPCIDevice { VFIOPCIDevice device; char *sock_name; + bool no_direct_dma; /* disable shared mem for DMA */ bool send_queued; /* all sends are queued */ bool no_post; /* all regions write are sync */ }; @@ -160,6 +161,9 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Erro= r **errp) vbasedev->proxy =3D proxy; vfio_user_set_handler(vbasedev, vfio_user_pci_process_req, vdev); =20 + if (udev->no_direct_dma) { + proxy->flags |=3D VFIO_PROXY_NO_MMAP; + } if (udev->send_queued) { proxy->flags |=3D VFIO_PROXY_FORCE_QUEUED; } @@ -269,6 +273,7 @@ static void vfio_user_instance_finalize(Object *obj) =20 static Property vfio_user_pci_dev_properties[] =3D { DEFINE_PROP_STRING("socket", VFIOUserPCIDevice, sock_name), + DEFINE_PROP_BOOL("no-direct-dma", VFIOUserPCIDevice, no_direct_dma, fa= lse), DEFINE_PROP_BOOL("x-send-queued", VFIOUserPCIDevice, send_queued, fals= e), DEFINE_PROP_BOOL("x-no-posted-writes", VFIOUserPCIDevice, no_post, fal= se), DEFINE_PROP_END_OF_LIST(), diff --git a/hw/vfio/user.c b/hw/vfio/user.c index fe6e476..0a7b354 100644 --- a/hw/vfio/user.c +++ b/hw/vfio/user.c @@ -1806,7 +1806,7 @@ static int vfio_user_io_dma_map(VFIOContainer *contai= ner, MemoryRegion *mr, * map->vaddr enters as a QEMU process address * make it either a file offset for mapped areas or 0 */ - if (fd !=3D -1) { + if (fd !=3D -1 && (container->proxy->flags & VFIO_PROXY_NO_MMAP) =3D= =3D 0) { void *addr =3D (void *)(uintptr_t)map->vaddr; =20 map->vaddr =3D qemu_ram_block_host_offset(mr->ram_block, addr); --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675316851; cv=none; d=zohomail.com; s=zohoarc; b=QnJDRIfxQMduXPAChaYrULvBmAbRWOTll5JaUur1QcYQkx8zlDH0qTPbRtvI78Jon5bxgpE0JLyK5oVhreu6plvvNWR3qOKBe/oZ0/fqC8lKFKZwE+5f5HACueh+ejTjoK4AaIfqWwJpp2p9RVlb9SWwaFEuX3kuzQj6MLXN2Ig= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675316851; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=1ofQxj7IEzonlwxc7MTPakfIAL6vUgDffeAIQs697jg=; b=mkiRouFFC7QAWIHn9SUDsiYaZdd7ZxVUgFoPgF9FVs0TVyV78s1J36098DoE0GxXY19tgjGJtD7VKYuzhqrBws0GIcrS1e1w3RP1G9xg0c76IUvPUxLEaaKZm6l/j8W1BXSaam8m2Y77QxFMouVffpnIAxZsA/78SAy0ecq8+7k= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1675316851918770.3731380230531; Wed, 1 Feb 2023 21:47:31 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSPm-0008SE-3J; Thu, 02 Feb 2023 00:45:42 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPZ-0008H1-6r for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:29 -0500 Received: from mx0a-00069f02.pphosted.com ([205.220.165.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPX-0007Cc-7H for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:28 -0500 Received: from pps.filterd (m0246629.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124iRSD031897; Thu, 2 Feb 2023 05:45:15 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfk64ag4s-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:45:14 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254Ppv013015; Thu, 2 Feb 2023 05:45:13 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-21 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:45:13 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=1ofQxj7IEzonlwxc7MTPakfIAL6vUgDffeAIQs697jg=; b=acHCLcWQ/2wcUoVenz+PWghROhQeAGilfUvArrkfo7FFGh87OP+XvuXnoj0GzklbTnn5 QYCdtcy2tZ84XqChrzguTZDwqy5/+WJOuKRAHqZavbAVzntVKmKlkDWm6Ig/Hw0bwXmg UnToGyb3+zEApKw6Cs0Bl7uYQLSdJXO0xezHmIRakNzUgJIc/Gppd9GlyshsLerQ8nFn O7dOhphCNVTMPTH8RWxJZ/t6oWqPxPTwXvmnddgM336amicAKNtHR8ZGOXonKTSZnlqd J+KJuLaNBLKq9nFQoZC5DMnM9tymc1iwWFBxBoG3gOee3MGRCsYG4lDPpHtmcC/BJaLw Kg== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 20/23] vfio-user: dma read/write operations Date: Wed, 1 Feb 2023 21:55:56 -0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-ORIG-GUID: vf6O7PN6J_PIOKt-DZajrUTWilvpYbTH X-Proofpoint-GUID: vf6O7PN6J_PIOKt-DZajrUTWilvpYbTH Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.165.32; envelope-from=john.g.johnson@oracle.com; helo=mx0a-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675316853779100003 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Messages from server to client that peform device DMA. Signed-off-by: Elena Ufimtseva Signed-off-by: John G Johnson Signed-off-by: Jagannathan Raman --- hw/vfio/user-protocol.h | 11 +++++ hw/vfio/user.h | 3 ++ hw/vfio/user-pci.c | 110 ++++++++++++++++++++++++++++++++++++++++++++= ++++ hw/vfio/user.c | 57 +++++++++++++++++++++++++ 4 files changed, 181 insertions(+) diff --git a/hw/vfio/user-protocol.h b/hw/vfio/user-protocol.h index 109076d..1a40cca 100644 --- a/hw/vfio/user-protocol.h +++ b/hw/vfio/user-protocol.h @@ -201,6 +201,17 @@ typedef struct { char data[]; } VFIOUserRegionRW; =20 +/* + * VFIO_USER_DMA_READ + * VFIO_USER_DMA_WRITE + */ +typedef struct { + VFIOUserHdr hdr; + uint64_t offset; + uint32_t count; + char data[]; +} VFIOUserDMARW; + /*imported from struct vfio_bitmap */ typedef struct { uint64_t pgsize; diff --git a/hw/vfio/user.h b/hw/vfio/user.h index fe0115b..ae7654f 100644 --- a/hw/vfio/user.h +++ b/hw/vfio/user.h @@ -97,6 +97,9 @@ VFIOGroup *vfio_user_get_group(VFIOUserProxy *proxy, Addr= essSpace *as, Error **errp); void vfio_user_put_group(VFIOGroup *group); int vfio_user_validate_version(VFIOUserProxy *proxy, Error **errp); +void vfio_user_send_reply(VFIOUserProxy *proxy, VFIOUserHdr *hdr, int size= ); +void vfio_user_send_error(VFIOUserProxy *proxy, VFIOUserHdr *hdr, int erro= r); +void vfio_user_putfds(VFIOUserMsg *msg); =20 extern VFIODeviceIO vfio_dev_io_sock; extern VFIOContainerIO vfio_cont_io_sock; diff --git a/hw/vfio/user-pci.c b/hw/vfio/user-pci.c index bf84d7c..6465b1c 100644 --- a/hw/vfio/user-pci.c +++ b/hw/vfio/user-pci.c @@ -101,6 +101,95 @@ static void vfio_user_msix_teardown(VFIOPCIDevice *vde= v) vdev->msix->pba_region =3D NULL; } =20 +static void vfio_user_dma_read(VFIOPCIDevice *vdev, VFIOUserDMARW *msg) +{ + PCIDevice *pdev =3D &vdev->pdev; + VFIOUserProxy *proxy =3D vdev->vbasedev.proxy; + VFIOUserDMARW *res; + MemTxResult r; + size_t size; + + if (msg->hdr.size < sizeof(*msg)) { + vfio_user_send_error(proxy, &msg->hdr, EINVAL); + return; + } + if (msg->count > proxy->max_xfer_size) { + vfio_user_send_error(proxy, &msg->hdr, E2BIG); + return; + } + + /* switch to our own message buffer */ + size =3D msg->count + sizeof(VFIOUserDMARW); + res =3D g_malloc0(size); + memcpy(res, msg, sizeof(*res)); + g_free(msg); + + r =3D pci_dma_read(pdev, res->offset, &res->data, res->count); + + switch (r) { + case MEMTX_OK: + if (res->hdr.flags & VFIO_USER_NO_REPLY) { + g_free(res); + return; + } + vfio_user_send_reply(proxy, &res->hdr, size); + break; + case MEMTX_ERROR: + vfio_user_send_error(proxy, &res->hdr, EFAULT); + break; + case MEMTX_DECODE_ERROR: + vfio_user_send_error(proxy, &res->hdr, ENODEV); + break; + case MEMTX_ACCESS_ERROR: + vfio_user_send_error(proxy, &res->hdr, EPERM); + break; + default: + error_printf("vfio_user_dma_read unknown error %d\n", r); + vfio_user_send_error(vdev->vbasedev.proxy, &res->hdr, EINVAL); + } +} + +static void vfio_user_dma_write(VFIOPCIDevice *vdev, VFIOUserDMARW *msg) +{ + PCIDevice *pdev =3D &vdev->pdev; + VFIOUserProxy *proxy =3D vdev->vbasedev.proxy; + MemTxResult r; + + if (msg->hdr.size < sizeof(*msg)) { + vfio_user_send_error(proxy, &msg->hdr, EINVAL); + return; + } + /* make sure transfer count isn't larger than the message data */ + if (msg->count > msg->hdr.size - sizeof(*msg)) { + vfio_user_send_error(proxy, &msg->hdr, E2BIG); + return; + } + + r =3D pci_dma_write(pdev, msg->offset, &msg->data, msg->count); + + switch (r) { + case MEMTX_OK: + if ((msg->hdr.flags & VFIO_USER_NO_REPLY) =3D=3D 0) { + vfio_user_send_reply(proxy, &msg->hdr, sizeof(msg->hdr)); + } else { + g_free(msg); + } + break; + case MEMTX_ERROR: + vfio_user_send_error(proxy, &msg->hdr, EFAULT); + break; + case MEMTX_DECODE_ERROR: + vfio_user_send_error(proxy, &msg->hdr, ENODEV); + break; + case MEMTX_ACCESS_ERROR: + vfio_user_send_error(proxy, &msg->hdr, EPERM); + break; + default: + error_printf("vfio_user_dma_write unknown error %d\n", r); + vfio_user_send_error(vdev->vbasedev.proxy, &msg->hdr, EINVAL); + } +} + /* * Incoming request message callback. * @@ -108,7 +197,28 @@ static void vfio_user_msix_teardown(VFIOPCIDevice *vde= v) */ static void vfio_user_pci_process_req(void *opaque, VFIOUserMsg *msg) { + VFIOPCIDevice *vdev =3D opaque; + VFIOUserHdr *hdr =3D msg->hdr; + + /* no incoming PCI requests pass FDs */ + if (msg->fds !=3D NULL) { + vfio_user_send_error(vdev->vbasedev.proxy, hdr, EINVAL); + vfio_user_putfds(msg); + return; + } =20 + switch (hdr->command) { + case VFIO_USER_DMA_READ: + vfio_user_dma_read(vdev, (VFIOUserDMARW *)hdr); + break; + case VFIO_USER_DMA_WRITE: + vfio_user_dma_write(vdev, (VFIOUserDMARW *)hdr); + break; + default: + error_printf("vfio_user_pci_process_req unknown cmd %d\n", + hdr->command); + vfio_user_send_error(vdev->vbasedev.proxy, hdr, ENOSYS); + } } =20 /* diff --git a/hw/vfio/user.c b/hw/vfio/user.c index 0a7b354..3aabf6b 100644 --- a/hw/vfio/user.c +++ b/hw/vfio/user.c @@ -394,6 +394,10 @@ static int vfio_user_recv_one(VFIOUserProxy *proxy) *msg->hdr =3D hdr; data =3D (char *)msg->hdr + sizeof(hdr); } else { + if (hdr.size > proxy->max_xfer_size + sizeof(VFIOUserDMARW)) { + error_setg(&local_err, "vfio_user_recv request larger than max= "); + goto err; + } buf =3D g_malloc0(hdr.size); memcpy(buf, &hdr, sizeof(hdr)); data =3D buf + sizeof(hdr); @@ -804,6 +808,59 @@ static void vfio_user_wait_reqs(VFIOUserProxy *proxy) } } =20 +/* + * Reply to an incoming request. + */ +void vfio_user_send_reply(VFIOUserProxy *proxy, VFIOUserHdr *hdr, int size) +{ + + if (size < sizeof(VFIOUserHdr)) { + error_printf("vfio_user_send_reply - size too small\n"); + g_free(hdr); + return; + } + + /* + * convert header to associated reply + */ + hdr->flags =3D VFIO_USER_REPLY; + hdr->size =3D size; + + vfio_user_send_async(proxy, hdr, NULL); +} + +/* + * Send an error reply to an incoming request. + */ +void vfio_user_send_error(VFIOUserProxy *proxy, VFIOUserHdr *hdr, int erro= r) +{ + + /* + * convert header to associated reply + */ + hdr->flags =3D VFIO_USER_REPLY; + hdr->flags |=3D VFIO_USER_ERROR; + hdr->error_reply =3D error; + hdr->size =3D sizeof(*hdr); + + vfio_user_send_async(proxy, hdr, NULL); +} + +/* + * Close FDs erroneously received in an incoming request. + */ +void vfio_user_putfds(VFIOUserMsg *msg) +{ + VFIOUserFDs *fds =3D msg->fds; + int i; + + for (i =3D 0; i < fds->recv_fds; i++) { + close(fds->fds[i]); + } + g_free(fds); + msg->fds =3D NULL; +} + static QLIST_HEAD(, VFIOUserProxy) vfio_user_sockets =3D QLIST_HEAD_INITIALIZER(vfio_user_sockets); =20 --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675316961; cv=none; d=zohomail.com; s=zohoarc; b=TAFrl6P0N8E6b+yQKvteBRhdb0bn0OVHgCefMkvZ2n/GrmYYySTbFejNcphKkBPkUKwlh/5+zKyIMztRQhQygMW92wpRz9XK/cKi8tAxqNZ2axHpyzvwWox7fPW78PaSUUBQVgK4K133WhRY1Fjt6hVO6BPT+8Ios73AbRHhqSk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675316961; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=nxPUsvoPf2NiOoVjPmD+DtgOnZtv9Uflk4rd/zKU5oE=; b=ixHctW8wpYjcqcMegkzOpM7almOEOq1plybDYsFFO0Sj1wyTa1SEbAWcSqxX7rWK3h9Wi7jd2DqmsVBUX0Z9uP7PMsoCtwVrzWZ87KJRJxgqOZWPKGiWuqeAYsRttnX3r4Ba+JPWmtA26vhG8MWnGyoRoaJfDItLybVRysbiRm8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 167531696117371.73101566759078; Wed, 1 Feb 2023 21:49:21 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSPO-0008DK-WA; Thu, 02 Feb 2023 00:45:19 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPN-0008Cq-UY for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:17 -0500 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPM-0007CU-6D for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:17 -0500 Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124iEKe006703; Thu, 2 Feb 2023 05:45:15 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfkfe2efu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:45:15 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254Ppw013015; Thu, 2 Feb 2023 05:45:14 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-22 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:45:14 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=nxPUsvoPf2NiOoVjPmD+DtgOnZtv9Uflk4rd/zKU5oE=; b=Sc+VRGULyIQRxNSwlibaLVXH/aKsE32dlUdtzmSV5tLHM5Tevq5VSZcR1MpLgUuddne2 JdYDvzrq+YwUcQWg1rHOYLbG4bMA+Zcj6PbDUCAjuH0471rklXwCTjqBmTheNAYF9rXm IQFTD70D9s5uXxnQFoMdlLLqYO02xUoLDy6Yz3qxtuS3TIE399aSAOEmU+nr6hXfGTW1 0B9B3Q85/mwBFniURELOJBBaa/aUWBuTwxQCIswaimkRkMibflX2o51M+W9BwmxEiwRp KfFeM3ryZ7OtlU7+j2iBhSteUIvLOr1CtpEQ09ETet5rPH3BonJvonBzp0f4zOB4/mvC 4g== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 21/23] vfio-user: pci reset Date: Wed, 1 Feb 2023 21:55:57 -0800 Message-Id: <564957e327a9d9a435b0544140ca97e09f7adb48.1675228037.git.john.g.johnson@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-GUID: zX5q8qer1nOBDBsokIIznek3iTp6OWxi X-Proofpoint-ORIG-GUID: zX5q8qer1nOBDBsokIIznek3iTp6OWxi Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.177.32; envelope-from=john.g.johnson@oracle.com; helo=mx0b-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675316962168100009 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Message to tell the server to reset the device. Signed-off-by: Elena Ufimtseva Signed-off-by: John G Johnson Signed-off-by: Jagannathan Raman --- hw/vfio/pci.h | 2 ++ hw/vfio/user.h | 1 + hw/vfio/pci.c | 4 ++-- hw/vfio/user-pci.c | 15 +++++++++++++++ hw/vfio/user.c | 12 ++++++++++++ 5 files changed, 32 insertions(+), 2 deletions(-) diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index d3e5d5f..3607c6e 100644 --- a/hw/vfio/pci.h +++ b/hw/vfio/pci.h @@ -218,6 +218,8 @@ void vfio_teardown_msi(VFIOPCIDevice *vdev); void vfio_bars_exit(VFIOPCIDevice *vdev); void vfio_bars_finalize(VFIOPCIDevice *vdev); int vfio_add_capabilities(VFIOPCIDevice *vdev, Error **errp); +void vfio_pci_pre_reset(VFIOPCIDevice *vdev); +void vfio_pci_post_reset(VFIOPCIDevice *vdev); void vfio_put_device(VFIOPCIDevice *vdev); void vfio_register_err_notifier(VFIOPCIDevice *vdev); void vfio_register_req_notifier(VFIOPCIDevice *vdev); diff --git a/hw/vfio/user.h b/hw/vfio/user.h index ae7654f..b7a9f57 100644 --- a/hw/vfio/user.h +++ b/hw/vfio/user.h @@ -100,6 +100,7 @@ int vfio_user_validate_version(VFIOUserProxy *proxy, Er= ror **errp); void vfio_user_send_reply(VFIOUserProxy *proxy, VFIOUserHdr *hdr, int size= ); void vfio_user_send_error(VFIOUserProxy *proxy, VFIOUserHdr *hdr, int erro= r); void vfio_user_putfds(VFIOUserMsg *msg); +void vfio_user_reset(VFIOUserProxy *proxy); =20 extern VFIODeviceIO vfio_dev_io_sock; extern VFIOContainerIO vfio_cont_io_sock; diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index 7b16f8f..52fbfe6 100644 --- a/hw/vfio/pci.c +++ b/hw/vfio/pci.c @@ -2225,7 +2225,7 @@ int vfio_add_capabilities(VFIOPCIDevice *vdev, Error = **errp) return 0; } =20 -static void vfio_pci_pre_reset(VFIOPCIDevice *vdev) +void vfio_pci_pre_reset(VFIOPCIDevice *vdev) { PCIDevice *pdev =3D &vdev->pdev; uint16_t cmd; @@ -2262,7 +2262,7 @@ static void vfio_pci_pre_reset(VFIOPCIDevice *vdev) vfio_pci_write_config(pdev, PCI_COMMAND, cmd, 2); } =20 -static void vfio_pci_post_reset(VFIOPCIDevice *vdev) +void vfio_pci_post_reset(VFIOPCIDevice *vdev) { VFIODevice *vbasedev =3D &vdev->vbasedev; Error *err =3D NULL; diff --git a/hw/vfio/user-pci.c b/hw/vfio/user-pci.c index 6465b1c..ee018db 100644 --- a/hw/vfio/user-pci.c +++ b/hw/vfio/user-pci.c @@ -381,6 +381,20 @@ static void vfio_user_instance_finalize(Object *obj) } } =20 +static void vfio_user_pci_reset(DeviceState *dev) +{ + VFIOPCIDevice *vdev =3D VFIO_PCI_BASE(dev); + VFIODevice *vbasedev =3D &vdev->vbasedev; + + vfio_pci_pre_reset(vdev); + + if (vbasedev->reset_works) { + vfio_user_reset(vbasedev->proxy); + } + + vfio_pci_post_reset(vdev); +} + static Property vfio_user_pci_dev_properties[] =3D { DEFINE_PROP_STRING("socket", VFIOUserPCIDevice, sock_name), DEFINE_PROP_BOOL("no-direct-dma", VFIOUserPCIDevice, no_direct_dma, fa= lse), @@ -394,6 +408,7 @@ static void vfio_user_pci_dev_class_init(ObjectClass *k= lass, void *data) DeviceClass *dc =3D DEVICE_CLASS(klass); PCIDeviceClass *pdc =3D PCI_DEVICE_CLASS(klass); =20 + dc->reset =3D vfio_user_pci_reset; device_class_set_props(dc, vfio_user_pci_dev_properties); dc->desc =3D "VFIO over socket PCI device assignment"; pdc->realize =3D vfio_user_pci_realize; diff --git a/hw/vfio/user.c b/hw/vfio/user.c index 3aabf6b..9b51686 100644 --- a/hw/vfio/user.c +++ b/hw/vfio/user.c @@ -1781,6 +1781,18 @@ static int vfio_user_region_write(VFIOUserProxy *pro= xy, uint8_t index, return ret; } =20 +void vfio_user_reset(VFIOUserProxy *proxy) +{ + VFIOUserHdr msg; + + vfio_user_request_msg(&msg, VFIO_USER_DEVICE_RESET, sizeof(msg), 0); + + vfio_user_send_wait(proxy, &msg, NULL, 0, false); + if (msg.flags & VFIO_USER_ERROR) { + error_printf("reset reply error %d\n", msg.error_reply); + } +} + =20 /* * Socket-based io_ops --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675316958; cv=none; d=zohomail.com; s=zohoarc; b=MNYW214QddrPfDK6xk+9NlBM8NnRh96UHaiHhbUBqT1nHcK16SDAlXH/yClmOMFVOhTyf1/H5/TOwcepSjP0ZSxrUf+tm1qq15ff53x4O91KoYwA0MZMK9dBVxpLrqIIVRpAcksXr+7aWuGkYztdgqFQsLR2GAlkIcZmWtfyLL4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675316958; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=iz+MH8zmguRnMEdN7jN/X9r4epUcgK6W52eyxmfWeLo=; b=KbJTU28Ox2wRfnwZOSW+FS3H+DYEuFRVxrHtTQb1Sy5p8trcgRR7G2EfxyijLoyLz9NRdNnS30bWWhUEJ2R6ms9y6cq9AEBNzOX47aZ3FvkmOo83JVFNO0yZsPH3P0hWk+YjbD+IQteO9xbglz9ypUYKHPTf4lML1Kv/B79i6IA= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1675316958419325.71348039022325; Wed, 1 Feb 2023 21:49:18 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSPQ-0008Dx-Ay; Thu, 02 Feb 2023 00:45:20 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPO-0008DJ-Ou for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:18 -0500 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPN-0007Cj-5Z for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:18 -0500 Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124iC11007366; Thu, 2 Feb 2023 05:45:16 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfn9yj84k-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:45:15 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254Ppx013015; Thu, 2 Feb 2023 05:45:15 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-23 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:45:15 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=iz+MH8zmguRnMEdN7jN/X9r4epUcgK6W52eyxmfWeLo=; b=XkVUf0ThypsAiF2dfLAehlwl+/bB/e27NnOWQyJcZ9Fi0lGP22iRetqy7dCF1Q3bTRit pL+0eJZ5oen7FTr+0HqM/pk+5Yr7ybf5ifb/p0rfLnytLhmtY6eWOn+gP/O9UCXqEarA Geyl9Z2Frw3+5rgj4CpKqBvxiPKy9QFziX6Kkft6vN3S/apVrX5p7074BxC3aBT7IzJP 3JMV4ZBmrHpd34Q49ISi9emZv23t1rafpqlGF364YE5FshXfvQBPIuvX31DuhVHgHhSn H09z7ywYfOatFSAw3S/5f82NtyqXzzw88CE3eB7/7h1QNtNxPdin6IJZPdAJ4Ej49EPq fg== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 22/23] vfio-user: add 'x-msg-timeout' option that specifies msg wait times Date: Wed, 1 Feb 2023 21:55:58 -0800 Message-Id: X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-ORIG-GUID: wTfl4qQnxUKCYmFORrYrxv_q1gbZ6XfC X-Proofpoint-GUID: wTfl4qQnxUKCYmFORrYrxv_q1gbZ6XfC Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.177.32; envelope-from=john.g.johnson@oracle.com; helo=mx0b-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675316960204100001 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Signed-off-by: John G Johnson Signed-off-by: Elena Ufimtseva Signed-off-by: Jagannathan Raman --- hw/vfio/user.h | 1 + hw/vfio/user-pci.c | 4 ++++ hw/vfio/user.c | 7 ++++--- 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/hw/vfio/user.h b/hw/vfio/user.h index b7a9f57..52b3f89 100644 --- a/hw/vfio/user.h +++ b/hw/vfio/user.h @@ -62,6 +62,7 @@ typedef struct VFIOUserProxy { uint64_t max_bitmap; uint64_t migr_pgsize; int flags; + uint32_t wait_time; QemuCond close_cv; AioContext *ctx; QEMUBH *req_bh; diff --git a/hw/vfio/user-pci.c b/hw/vfio/user-pci.c index ee018db..5d0b224 100644 --- a/hw/vfio/user-pci.c +++ b/hw/vfio/user-pci.c @@ -43,6 +43,7 @@ struct VFIOUserPCIDevice { bool no_direct_dma; /* disable shared mem for DMA */ bool send_queued; /* all sends are queued */ bool no_post; /* all regions write are sync */ + uint32_t wait_time; /* timeout for message replies */ }; =20 /* @@ -280,6 +281,8 @@ static void vfio_user_pci_realize(PCIDevice *pdev, Erro= r **errp) if (udev->no_post) { proxy->flags |=3D VFIO_PROXY_NO_POST; } + /* user specified or 5 sec default */ + proxy->wait_time =3D udev->wait_time; =20 vfio_user_validate_version(proxy, &err); if (err !=3D NULL) { @@ -400,6 +403,7 @@ static Property vfio_user_pci_dev_properties[] =3D { DEFINE_PROP_BOOL("no-direct-dma", VFIOUserPCIDevice, no_direct_dma, fa= lse), DEFINE_PROP_BOOL("x-send-queued", VFIOUserPCIDevice, send_queued, fals= e), DEFINE_PROP_BOOL("x-no-posted-writes", VFIOUserPCIDevice, no_post, fal= se), + DEFINE_PROP_UINT32("x-msg-timeout", VFIOUserPCIDevice, wait_time, 5000= ), DEFINE_PROP_END_OF_LIST(), }; =20 diff --git a/hw/vfio/user.c b/hw/vfio/user.c index 9b51686..af5471b 100644 --- a/hw/vfio/user.c +++ b/hw/vfio/user.c @@ -44,7 +44,6 @@ #define VFIO_USER_MAX_REGIONS 100 #define VFIO_USER_MAX_IRQS 50 =20 -static int wait_time =3D 5000; /* wait up to 5 sec for busy servers */ static IOThread *vfio_user_iothread; =20 static void vfio_user_shutdown(VFIOUserProxy *proxy); @@ -735,7 +734,8 @@ static void vfio_user_send_wait(VFIOUserProxy *proxy, V= FIOUserHdr *hdr, =20 if (ret =3D=3D 0) { while (!msg->complete) { - if (!qemu_cond_timedwait(&msg->cv, &proxy->lock, wait_time)) { + if (!qemu_cond_timedwait(&msg->cv, &proxy->lock, + proxy->wait_time)) { VFIOUserMsgQ *list; =20 list =3D msg->pending ? &proxy->pending : &proxy->outgoing; @@ -778,7 +778,8 @@ static void vfio_user_wait_reqs(VFIOUserProxy *proxy) msg->type =3D VFIO_MSG_WAIT; proxy->last_nowait =3D NULL; while (!msg->complete) { - if (!qemu_cond_timedwait(&msg->cv, &proxy->lock, wait_time)) { + if (!qemu_cond_timedwait(&msg->cv, &proxy->lock, + proxy->wait_time)) { VFIOUserMsgQ *list; =20 list =3D msg->pending ? &proxy->pending : &proxy->outgoing; --=20 1.9.4 From nobody Tue Apr 23 23:23:02 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=oracle.com ARC-Seal: i=1; a=rsa-sha256; t=1675316859; cv=none; d=zohomail.com; s=zohoarc; b=bly5RwPhuoAVraGstoSpkkJ5ykQiCwrpjQq0UflsU1rojfhQW7iZ+ssZnCODan7Ab8j7VnttKG6DqlVu3mXvDoxH4mPiNi82wXvgHaAOJJePGXYgqiCsGY7jNhZGmVBEknUNe8TIC/lO44wgvET+NQAXh1/MNvILT239NGux/1E= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675316859; h=Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:Message-ID:References:Sender:Subject:To; bh=gtARWBiR2+X07fX6RUv+cUrN8v2gIHvt6wNYGu89IbQ=; b=A+56vCZS7YskUm9TxDYxEWr+dLxtDDELvqq4avwTAzktODqn+cs2muGVwQnCHUhilirkw9JG0XwW9M1OTMZ7UhQChsz1YRFlSZjK4afhm/RUH1wsUstQGt5dQsOWAY5ByAhEmpOqQFEB2DqGIOoc6Zxlunb4n4AgrxAgKoP6KAU= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 167531685929734.2954911825168; Wed, 1 Feb 2023 21:47:39 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pNSPS-0008GR-Nx; Thu, 02 Feb 2023 00:45:22 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPQ-0008E7-El for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:20 -0500 Received: from mx0b-00069f02.pphosted.com ([205.220.177.32]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pNSPO-0007D4-3b for qemu-devel@nongnu.org; Thu, 02 Feb 2023 00:45:19 -0500 Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3124i0ho006579; Thu, 2 Feb 2023 05:45:17 GMT Received: from iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta01.appoci.oracle.com [130.35.100.223]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3nfkfe2efv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 02 Feb 2023 05:45:16 +0000 Received: from pps.filterd (iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.5/8.17.1.5) with ESMTP id 31254Pq1013015; Thu, 2 Feb 2023 05:45:16 GMT Received: from bruckner.us.oracle.com (dhcp-10-65-133-23.vpn.oracle.com [10.65.133.23]) by iadpaimrmta01.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3nct5f5gb1-24 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Thu, 02 Feb 2023 05:45:16 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : in-reply-to : references; s=corp-2022-7-12; bh=gtARWBiR2+X07fX6RUv+cUrN8v2gIHvt6wNYGu89IbQ=; b=f+z6lJ+2XkRi/J1OEA8xLKDGsGkbt0kMIccD9PB+RHrJDZHlDQ8Z83WSACqLkNMmPRTS JJCKDkhQxbg/eL7n0S6nQwAArlV7axCQ0fJkJXU7hDnlKIs+Nn+57U3c5eqg3BVwDb9q bnMYIPfWJWkZbpR0sNGgfGreDs8tZBN6xXDs65H5Du1PYfIgC/Ljm4/E7r/WseNbLDWy mqScZjm3xUjQWE5cBA+pZjXyk04gPiJfyCainny2VVzUfSJyKRx2wDeGyST+VjiaD9HT uif8PDsW0sZt5OO5nni8jUEdFb7EViAUqiVYU/op2TLrDyRDfTLPbnZZD3k5imJoHF+m Lg== From: John Johnson To: qemu-devel@nongnu.org Cc: alex.williamson@redhat.com, clg@redhat.com, philmd@linaro.org Subject: [PATCH v2 23/23] vfio-user: add coalesced posted writes Date: Wed, 1 Feb 2023 21:55:59 -0800 Message-Id: <5f30b3eb2ee44c772c0d89cce42a3f0f3a57a3c2.1675228037.git.john.g.johnson@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: In-Reply-To: References: X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-01_15,2023-01-31_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 phishscore=0 malwarescore=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020053 X-Proofpoint-GUID: Gh_f8j2sYr4AdfPD6ilDD0CR_DTzYPZj X-Proofpoint-ORIG-GUID: Gh_f8j2sYr4AdfPD6ilDD0CR_DTzYPZj Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=205.220.177.32; envelope-from=john.g.johnson@oracle.com; helo=mx0b-00069f02.pphosted.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @oracle.com) X-ZM-MESSAGEID: 1675316859764100001 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Add new message to send multiple writes to server. Prevents the outgoing queue from overflowing when a long latency operation is followed by a series of posted writes. Signed-off-by: John G Johnson Signed-off-by: Elena Ufimtseva Signed-off-by: Jagannathan Raman --- hw/vfio/user-protocol.h | 21 ++++++++ hw/vfio/user.h | 7 +++ hw/vfio/user.c | 130 ++++++++++++++++++++++++++++++++++++++++++++= +++- hw/vfio/trace-events | 1 + 4 files changed, 157 insertions(+), 2 deletions(-) diff --git a/hw/vfio/user-protocol.h b/hw/vfio/user-protocol.h index 1a40cca..d09c29e 100644 --- a/hw/vfio/user-protocol.h +++ b/hw/vfio/user-protocol.h @@ -40,6 +40,7 @@ enum vfio_user_command { VFIO_USER_DMA_WRITE =3D 12, VFIO_USER_DEVICE_RESET =3D 13, VFIO_USER_DIRTY_PAGES =3D 14, + VFIO_USER_REGION_WRITE_MULTI =3D 15, VFIO_USER_MAX, }; =20 @@ -73,6 +74,7 @@ typedef struct { #define VFIO_USER_CAP_PGSIZES "pgsizes" #define VFIO_USER_CAP_MAP_MAX "max_dma_maps" #define VFIO_USER_CAP_MIGR "migration" +#define VFIO_USER_CAP_MULTI "write_multiple" =20 /* "migration" members */ #define VFIO_USER_CAP_PGSIZE "pgsize" @@ -219,4 +221,23 @@ typedef struct { char data[]; } VFIOUserBitmap; =20 +/* + * VFIO_USER_REGION_WRITE_MULTI + */ +#define VFIO_USER_MULTI_DATA 8 +#define VFIO_USER_MULTI_MAX 200 + +typedef struct { + uint64_t offset; + uint32_t region; + uint32_t count; + char data[VFIO_USER_MULTI_DATA]; +} VFIOUserWROne; + +typedef struct { + VFIOUserHdr hdr; + uint64_t wr_cnt; + VFIOUserWROne wrs[VFIO_USER_MULTI_MAX]; +} VFIOUserWRMulti; + #endif /* VFIO_USER_PROTOCOL_H */ diff --git a/hw/vfio/user.h b/hw/vfio/user.h index 52b3f89..a5cf969 100644 --- a/hw/vfio/user.h +++ b/hw/vfio/user.h @@ -79,6 +79,8 @@ typedef struct VFIOUserProxy { VFIOUserMsg *last_nowait; VFIOUserMsg *part_recv; size_t recv_left; + VFIOUserWRMulti *wr_multi; + int num_outgoing; enum proxy_state state; } VFIOUserProxy; =20 @@ -87,6 +89,11 @@ typedef struct VFIOUserProxy { #define VFIO_PROXY_NO_MMAP 0x2 #define VFIO_PROXY_FORCE_QUEUED 0x4 #define VFIO_PROXY_NO_POST 0x8 +#define VFIO_PROXY_USE_MULTI 0x10 + +/* coalescing high and low water marks for VFIOProxy num_outgoing */ +#define VFIO_USER_OUT_HIGH 1024 +#define VFIO_USER_OUT_LOW 128 =20 VFIOUserProxy *vfio_user_connect_dev(SocketAddress *addr, Error **errp); void vfio_user_disconnect(VFIOUserProxy *proxy); diff --git a/hw/vfio/user.c b/hw/vfio/user.c index af5471b..bcdfccf 100644 --- a/hw/vfio/user.c +++ b/hw/vfio/user.c @@ -70,6 +70,7 @@ static void vfio_user_send_wait(VFIOUserProxy *proxy, VFI= OUserHdr *hdr, static void vfio_user_wait_reqs(VFIOUserProxy *proxy); static void vfio_user_request_msg(VFIOUserHdr *hdr, uint16_t cmd, uint32_t size, uint32_t flags); +static void vfio_user_flush_multi(VFIOUserProxy *proxy); =20 static int vfio_user_get_info(VFIOUserProxy *proxy, struct vfio_device_info *info); @@ -476,6 +477,11 @@ static void vfio_user_send(void *opaque) } qio_channel_set_aio_fd_handler(proxy->ioc, proxy->ctx, vfio_user_recv, NULL, proxy); + + /* queue empty - send any pending multi write msgs */ + if (proxy->wr_multi !=3D NULL) { + vfio_user_flush_multi(proxy); + } } } =20 @@ -496,6 +502,7 @@ static int vfio_user_send_one(VFIOUserProxy *proxy) } =20 QTAILQ_REMOVE(&proxy->outgoing, msg, next); + proxy->num_outgoing--; if (msg->type =3D=3D VFIO_MSG_ASYNC) { vfio_user_recycle(proxy, msg); } else { @@ -603,11 +610,18 @@ static int vfio_user_send_queued(VFIOUserProxy *proxy= , VFIOUserMsg *msg) { int ret; =20 + /* older coalesced writes go first */ + if (proxy->wr_multi !=3D NULL && + ((msg->hdr->flags & VFIO_USER_TYPE) =3D=3D VFIO_USER_REQUEST)) { + vfio_user_flush_multi(proxy); + } + /* * Unsent outgoing msgs - add to tail */ if (!QTAILQ_EMPTY(&proxy->outgoing)) { QTAILQ_INSERT_TAIL(&proxy->outgoing, msg, next); + proxy->num_outgoing++; return 0; } =20 @@ -621,6 +635,7 @@ static int vfio_user_send_queued(VFIOUserProxy *proxy, = VFIOUserMsg *msg) } if (ret =3D=3D QIO_CHANNEL_ERR_BLOCK) { QTAILQ_INSERT_HEAD(&proxy->outgoing, msg, next); + proxy->num_outgoing =3D 1; qio_channel_set_aio_fd_handler(proxy->ioc, proxy->ctx, vfio_user_recv, vfio_user_send, proxy); @@ -1326,12 +1341,27 @@ static int check_migr(VFIOUserProxy *proxy, QObject= *qobj, Error **errp) return caps_parse(proxy, qdict, caps_migr, errp); } =20 +static int check_multi(VFIOUserProxy *proxy, QObject *qobj, Error **errp) +{ + QBool *qb =3D qobject_to(QBool, qobj); + + if (qb =3D=3D NULL) { + error_setg(errp, "malformed %s", VFIO_USER_CAP_MULTI); + return -1; + } + if (qbool_get_bool(qb)) { + proxy->flags |=3D VFIO_PROXY_USE_MULTI; + } + return 0; +} + static struct cap_entry caps_cap[] =3D { { VFIO_USER_CAP_MAX_FDS, check_max_fds }, { VFIO_USER_CAP_MAX_XFER, check_max_xfer }, { VFIO_USER_CAP_PGSIZES, check_pgsizes }, { VFIO_USER_CAP_MAP_MAX, check_max_dma }, { VFIO_USER_CAP_MIGR, check_migr }, + { VFIO_USER_CAP_MULTI, check_multi }, { NULL } }; =20 @@ -1390,6 +1420,7 @@ static GString *caps_json(void) qdict_put_int(capdict, VFIO_USER_CAP_MAX_XFER, VFIO_USER_DEF_MAX_XFER); qdict_put_int(capdict, VFIO_USER_CAP_PGSIZES, VFIO_USER_DEF_PGSIZE); qdict_put_int(capdict, VFIO_USER_CAP_MAP_MAX, VFIO_USER_DEF_MAP_MAX); + qdict_put_bool(capdict, VFIO_USER_CAP_MULTI, true); =20 qdict_put_obj(dict, VFIO_USER_CAP, QOBJECT(capdict)); =20 @@ -1744,19 +1775,114 @@ static int vfio_user_region_read(VFIOUserProxy *pr= oxy, uint8_t index, return msgp->count; } =20 +static void vfio_user_flush_multi(VFIOUserProxy *proxy) +{ + VFIOUserMsg *msg; + VFIOUserWRMulti *wm =3D proxy->wr_multi; + int ret; + + proxy->wr_multi =3D NULL; + + /* adjust size for actual # of writes */ + wm->hdr.size -=3D (VFIO_USER_MULTI_MAX - wm->wr_cnt) * sizeof(VFIOUser= WROne); + + msg =3D vfio_user_getmsg(proxy, &wm->hdr, NULL); + msg->id =3D wm->hdr.id; + msg->rsize =3D 0; + msg->type =3D VFIO_MSG_ASYNC; + trace_vfio_user_wrmulti("flush", wm->wr_cnt); + + ret =3D vfio_user_send_queued(proxy, msg); + if (ret < 0) { + vfio_user_recycle(proxy, msg); + } +} + +static void vfio_user_create_multi(VFIOUserProxy *proxy) +{ + VFIOUserWRMulti *wm; + + wm =3D g_malloc0(sizeof(*wm)); + vfio_user_request_msg(&wm->hdr, VFIO_USER_REGION_WRITE_MULTI, + sizeof(*wm), VFIO_USER_NO_REPLY); + proxy->wr_multi =3D wm; +} + +static void vfio_user_add_multi(VFIOUserProxy *proxy, uint8_t index, + off_t offset, uint32_t count, void *data) +{ + VFIOUserWRMulti *wm =3D proxy->wr_multi; + VFIOUserWROne *w1 =3D &wm->wrs[wm->wr_cnt]; + + w1->offset =3D offset; + w1->region =3D index; + w1->count =3D count; + memcpy(&w1->data, data, count); + + wm->wr_cnt++; + trace_vfio_user_wrmulti("add", wm->wr_cnt); + if (wm->wr_cnt =3D=3D VFIO_USER_MULTI_MAX || + proxy->num_outgoing < VFIO_USER_OUT_LOW) { + vfio_user_flush_multi(proxy); + } +} + static int vfio_user_region_write(VFIOUserProxy *proxy, uint8_t index, off_t offset, uint32_t count, void *data, bool post) { VFIOUserRegionRW *msgp =3D NULL; - int flags =3D post ? VFIO_USER_NO_REPLY : 0; + int flags; int size =3D sizeof(*msgp) + count; + bool can_multi; int ret; =20 if (count > proxy->max_xfer_size) { return -EINVAL; } =20 + if (proxy->flags & VFIO_PROXY_NO_POST) { + post =3D false; + } + + /* write eligible to be in a WRITE_MULTI msg ? */ + can_multi =3D (proxy->flags & VFIO_PROXY_USE_MULTI) && post && + count <=3D VFIO_USER_MULTI_DATA; + + /* + * This should be a rare case, so first check without the lock, + * if we're wrong, vfio_send_queued() will flush any posted writes + * we missed here + */ + if (proxy->wr_multi !=3D NULL || + (proxy->num_outgoing > VFIO_USER_OUT_HIGH && can_multi)) { + + /* + * re-check with lock + * + * if already building a WRITE_MULTI msg, + * add this one if possible else flush pending before + * sending the current one + * + * else if outgoing queue is over the highwater, + * start a new WRITE_MULTI message + */ + WITH_QEMU_LOCK_GUARD(&proxy->lock) { + if (proxy->wr_multi !=3D NULL) { + if (can_multi) { + vfio_user_add_multi(proxy, index, offset, count, data); + return count; + } + vfio_user_flush_multi(proxy); + } else if (proxy->num_outgoing > VFIO_USER_OUT_HIGH && can_mul= ti) { + vfio_user_create_multi(proxy); + vfio_user_add_multi(proxy, index, offset, count, data); + return count; + } + } + } + + flags =3D post ? VFIO_USER_NO_REPLY : 0; msgp =3D g_malloc0(size); vfio_user_request_msg(&msgp->hdr, VFIO_USER_REGION_WRITE, size, flags); msgp->offset =3D offset; @@ -1766,7 +1892,7 @@ static int vfio_user_region_write(VFIOUserProxy *prox= y, uint8_t index, trace_vfio_user_region_rw(msgp->region, msgp->offset, msgp->count); =20 /* async send will free msg after it's sent */ - if (post && !(proxy->flags & VFIO_PROXY_NO_POST)) { + if (post) { vfio_user_send_async(proxy, &msgp->hdr, NULL); return count; } diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index a4e02ff..e1e9681 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -180,3 +180,4 @@ vfio_user_get_irq_info(uint32_t index, uint32_t flags, = uint32_t count) " index % vfio_user_set_irqs(uint32_t index, uint32_t start, uint32_t count, uint32_= t flags) " index %d start %d count %d flags 0x%x" vfio_user_dma_map(uint64_t iova, uint64_t size, uint64_t off, uint32_t fla= gs, bool will_commit) " iova 0x%"PRIx64" size 0x%"PRIx64" off 0x%"PRIx64" f= lags 0x%x will_commit %d" vfio_user_dma_unmap(uint64_t iova, uint64_t size, uint32_t flags, bool dir= ty, bool will_commit) " iova 0x%"PRIx64" size 0x%"PRIx64" flags 0x%x dirty = %d will_commit %d" +vfio_user_wrmulti(const char *s, uint64_t wr_cnt) " %s count 0x%"PRIx64 --=20 1.9.4