From nobody Tue May 13 15:25:21 2025 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; arc=pass (i=1 dmarc=pass fromdomain=nvidia.com); dmarc=pass(p=reject dis=none) header.from=nvidia.com ARC-Seal: i=2; a=rsa-sha256; t=1673513610; cv=pass; d=zohomail.com; s=zohoarc; b=OfltRRq5Q+GdNljFjKDfc6DCEtuEHR1R+5zIeDbREoFL7b/3nwpCgV9bUCXTwoLh4buzlJF1Whdmm7dG/aqeNx3mN4xqcRQe/COFjC/17ewnQlCha0dSa1clANfzCyvY5LDw11DAZwGCWne2Ef4XIL8hOYw5a0JmYwlrzlvrEo8= ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1673513610; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=Y3dnTxRNbzchM98MvirUzpSLvA2I7wBPStoecrc95h0=; b=E63nr0I0D99tUJ69aQ7wIfrFVWXPnAHAztNlDuiZnrlZPLsbF+dv1+gxlDQsTzzmJw/BKwmq8LP5TyUuqRvBrEED57pLTCAuv0eFmzxCEHfDrSfZ3zaWhehyIoYmtoIFmuUpleV+TXxE8p4damnyNgWfSdycgWyrBeauSVgX+xk= ARC-Authentication-Results: i=2; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; arc=pass (i=1 dmarc=pass fromdomain=nvidia.com); dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTP id 1673513610855769.1880439363808; Thu, 12 Jan 2023 00:53:30 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pFtK9-0006PN-HO; Thu, 12 Jan 2023 03:52:37 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pFtJq-0006Gw-57; Thu, 12 Jan 2023 03:52:28 -0500 Received: from mail-dm6nam10on2073.outbound.protection.outlook.com ([40.107.93.73] helo=NAM10-DM6-obe.outbound.protection.outlook.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pFtJk-0007jR-3I; Thu, 12 Jan 2023 03:52:15 -0500 Received: from DM6PR05CA0053.namprd05.prod.outlook.com (2603:10b6:5:335::22) by BY5PR12MB4147.namprd12.prod.outlook.com (2603:10b6:a03:205::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13; Thu, 12 Jan 2023 08:52:07 +0000 Received: from DS1PEPF0000E633.namprd02.prod.outlook.com (2603:10b6:5:335:cafe::33) by DM6PR05CA0053.outlook.office365.com (2603:10b6:5:335::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Thu, 12 Jan 2023 08:52:07 +0000 Received: from mail.nvidia.com (216.228.118.232) by DS1PEPF0000E633.mail.protection.outlook.com (10.167.17.137) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.11 via Frontend Transport; Thu, 12 Jan 2023 08:52:07 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 12 Jan 2023 00:51:56 -0800 Received: from drhqmail203.nvidia.com (10.126.190.182) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Thu, 12 Jan 2023 00:51:56 -0800 Received: from vdi.nvidia.com (10.127.8.9) by mail.nvidia.com (10.126.190.182) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Thu, 12 Jan 2023 00:51:50 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=IOAfMWDbbyPguUGf3oD2/J01qx2WGLK9rbXXG7DdxSj+gdZi9q68/51asQTMikwWVZyc8wu9sc5Suh3NDgV+D5VrT/7gdIzGGlCY748WHVXuc/B8s1Z9uufmOs+KILxPcPUN95J0vGcvC7cf3xRiv4A2IHcMmhmXtQRLAh7VIbalysG7U+Pi5yE1kTwEYAB64wGv3ISqXtb1Rg/I4yZrpQ0EmsbiZYpASZjGwDiCOxHzMnkYtxkq1NmDEPh2zipCySXsGI3zSRZhXaV6165Pti4G1yz+IQ90khySt4CaG2mjYDx5chK1tiLdZ+I0UKKTiXUP6gg0Xu0/VR+KA7uOrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Y3dnTxRNbzchM98MvirUzpSLvA2I7wBPStoecrc95h0=; b=i+5lBtQddnYD1oZSLU5nCAFa3oaYRhx2CadtsmQX3EeFJmuKnocadpLAmeEQued0Nmej+K1F9Kg9s7qByD/8g0rzUCj1K/7hUWBccAfGqyJui0Q8MF+3Qx4Csdvss9Fu2Q5VutrBnfMRDpYn5BWHOxMC5TsqvJGpbw+jHYXj3d2bVOVBqhjKgCjIHOvfRxcqPgF/AtFCVodMznbYtkltWlm/Sk6Zt6vNiflHI57t/mWYa0B9Pmp9Zogxf37f3CsVZpW2GhuuSAIbwz+U/AzDAKfZJP2nD1aYYTMVdXka1SF8R81HgKy+IiVBfeyJUDGV3tAR5qCowxRIUAG8+Th24g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=nongnu.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Y3dnTxRNbzchM98MvirUzpSLvA2I7wBPStoecrc95h0=; b=iRPBYVrf0ZMRcAiYMJIc8f0nnT69p3qkH8XvR5SK86bVHWPUdlnw6x9AO9pOj+Ii3dY3eXcpH+/MsSWolIctqe8smsxLmuGJo+UMtciT2ynHkMEXRci2pISSldQ4PT4Tvht4XkSJo/yUM8EeWJQXAIeEIxjYIlVhuf0bPS9eMuVCp4W7pJoVYRP1KiICiocqEF8TvNTcJZLrW9JNzCVKVZfVZy2MqktJPw1G2SzETyHVEo3Du4O5d3MDVimBfSLl1pKz2DrlfWUw06BRXopCm0LG4Jo5wlXTKgxsTUn1K+K5moFXJeUZJD7ffuu/wmTb/w1seO+RKSVRKaHVzBazew== X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C From: Avihai Horon To: CC: Alex Williamson , Halil Pasic , Christian Borntraeger , Eric Farman , Richard Henderson , David Hildenbrand , "Ilya Leoshkevich" , Thomas Huth , "Juan Quintela" , "Dr. David Alan Gilbert" , "Michael S. Tsirkin" , Cornelia Huck , Paolo Bonzini , Stefan Hajnoczi , Fam Zheng , Eric Blake , Vladimir Sementsov-Ogievskiy , John Snow , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= , , , Yishai Hadas , Jason Gunthorpe , Maor Gottlieb , Avihai Horon , Kirti Wankhede , Tarun Gupta , Joao Martins Subject: [PATCH v6 13/13] docs/devel: Align VFIO migration docs to v2 protocol Date: Thu, 12 Jan 2023 10:50:20 +0200 Message-ID: <20230112085020.15866-14-avihaih@nvidia.com> X-Mailer: git-send-email 2.21.3 In-Reply-To: <20230112085020.15866-1-avihaih@nvidia.com> References: <20230112085020.15866-1-avihaih@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS1PEPF0000E633:EE_|BY5PR12MB4147:EE_ X-MS-Office365-Filtering-Correlation-Id: 9dd7e72a-1ff3-4f87-1dbf-08daf47a48fc X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: BIoUSfoDKjTs5q5yi4IHxbVn4FozTZR31K38zrK2MI0pbR4X7+NkUWf3YkWeVAjNuSIymABQC2qUa/gg4f5ccYptymDyR6avj/Io949ys2EOI0JVFfzX3fZeneNKpmcxv43dfXoillVerN65wsKAKtF1ouAsdBIjCmZPvRb5gLwAEMVMef9cJtw2a4xq34FI1YknmZQWni412CfC8brg6vzwOCzmblxB5Un9J9ckjBcyPXNriZQz1nTd4bU1aecjGHkhzqKzrpQMCQNFCwyoQweRrcWJ/RncQHblSVOiKKWex+Us7IzhIZLRi5E15ljVkQUcmxwgyD5ZnwETlh1p+rSfNOAoRIUa8VqAVLMti9KL/mDydq0R7YjfEywCw+wIYBVYLNy6RXl7IpNwMs9xfIoRBb4a1ISS0x43uUml/Ea+jyF9s+uQZp+oY7DXaSQ5GAD/HAdRxvydH+Nmdofn7/OPjmRNcnGFHRGAdNI2P9qHqxchsIg3I2kwRAnOX2hgGks9HyOjxZxX1jWLt/caqQLiBRRGAZS5uRSKXq7zjGgIHGT/W9KoqOhSUe8mAuw+F4sz2nylOmo+iueDLrvgRdXQo5hh9t+lWWUSL+LeSwnKYeKPrES+D2UGOt9aFBdo5Gt1W2kVDW0a0tv6jQSPR8dx2YURJdJBpLulNH9JreKPI9f+9j4hxgBAcHNOHWPmkzHJ8/9AjPeq32cZ+WufeQ== X-Forefront-Antispam-Report: CIP:216.228.118.232; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc7edge1.nvidia.com; CAT:NONE; SFS:(13230022)(4636009)(39860400002)(376002)(136003)(346002)(396003)(451199015)(40470700004)(36840700001)(46966006)(36756003)(82740400003)(7636003)(70586007)(6916009)(70206006)(8676002)(336012)(4326008)(82310400005)(186003)(86362001)(26005)(40460700003)(1076003)(316002)(356005)(2616005)(40480700001)(54906003)(7696005)(478600001)(5660300002)(6666004)(2906002)(41300700001)(47076005)(8936002)(426003)(7416002)(83380400001)(36860700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Jan 2023 08:52:07.2609 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9dd7e72a-1ff3-4f87-1dbf-08daf47a48fc X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.118.232]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DS1PEPF0000E633.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR12MB4147 Received-SPF: softfail client-ip=40.107.93.73; envelope-from=avihaih@nvidia.com; helo=NAM10-DM6-obe.outbound.protection.outlook.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @Nvidia.com) X-ZM-MESSAGEID: 1673513611958100001 Content-Type: text/plain; charset="utf-8" Now that VFIO migration protocol v2 has been implemented and v1 protocol has been removed, update the documentation according to v2 protocol. Signed-off-by: Avihai Horon Reviewed-by: C=C3=A9dric Le Goater --- docs/devel/vfio-migration.rst | 68 ++++++++++++++++------------------- 1 file changed, 30 insertions(+), 38 deletions(-) diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst index 9ff6163c88..1d50c2fe5f 100644 --- a/docs/devel/vfio-migration.rst +++ b/docs/devel/vfio-migration.rst @@ -7,46 +7,39 @@ the guest is running on source host and restoring this sa= ved state on the destination host. This document details how saving and restoring of VFIO devices is done in QEMU. =20 -Migration of VFIO devices consists of two phases: the optional pre-copy ph= ase, -and the stop-and-copy phase. The pre-copy phase is iterative and allows to -accommodate VFIO devices that have a large amount of data that needs to be -transferred. The iterative pre-copy phase of migration allows for the gues= t to -continue whilst the VFIO device state is transferred to the destination, t= his -helps to reduce the total downtime of the VM. VFIO devices can choose to s= kip -the pre-copy phase of migration by returning pending_bytes as zero during = the -pre-copy phase. +Migration of VFIO devices currently consists of a single stop-and-copy pha= se. +During the stop-and-copy phase the guest is stopped and the entire VFIO de= vice +data is transferred to the destination. + +The pre-copy phase of migration is currently not supported for VFIO device= s. +Support for VFIO pre-copy will be added later on. =20 A detailed description of the UAPI for VFIO device migration can be found = in -the comment for the ``vfio_device_migration_info`` structure in the header -file linux-headers/linux/vfio.h. +the comment for the ``vfio_device_mig_state`` structure in the header file +linux-headers/linux/vfio.h. =20 VFIO implements the device hooks for the iterative approach as follows: =20 -* A ``save_setup`` function that sets up the migration region and sets _SA= VING - flag in the VFIO device state. +* A ``save_setup`` function that sets up migration on the source. =20 -* A ``load_setup`` function that sets up the migration region on the - destination and sets _RESUMING flag in the VFIO device state. +* A ``load_setup`` function that sets the VFIO device on the destination in + _RESUMING state. =20 * A ``save_live_pending`` function that reads pending_bytes from the vendor driver, which indicates the amount of data that the vendor driver has ye= t to save for the VFIO device. =20 -* A ``save_live_iterate`` function that reads the VFIO device's data from = the - vendor driver through the migration region during iterative phase. - * A ``save_state`` function to save the device config space if it is prese= nt. =20 -* A ``save_live_complete_precopy`` function that resets _RUNNING flag from= the - VFIO device state and iteratively copies the remaining data for the VFIO - device until the vendor driver indicates that no data remains (pending b= ytes - is zero). +* A ``save_live_complete_precopy`` function that sets the VFIO device in + _STOP_COPY state and iteratively copies the data for the VFIO device unt= il + the vendor driver indicates that no data remains. =20 * A ``load_state`` function that loads the config section and the data - sections that are generated by the save functions above + sections that are generated by the save functions above. =20 * ``cleanup`` functions for both save and load that perform any migration - related cleanup, including unmapping the migration region + related cleanup. =20 =20 The VFIO migration code uses a VM state change handler to change the VFIO @@ -71,13 +64,13 @@ tracking can identify dirtied pages, but any page pinne= d by the vendor driver can also be written by the device. There is currently no device or IOMMU support for dirty page tracking in hardware. =20 -By default, dirty pages are tracked when the device is in pre-copy as well= as -stop-and-copy phase. So, a page pinned by the vendor driver will be copied= to -the destination in both phases. Copying dirty pages in pre-copy phase helps -QEMU to predict if it can achieve its downtime tolerances. If QEMU during -pre-copy phase keeps finding dirty pages continuously, then it understands -that even in stop-and-copy phase, it is likely to find dirty pages and can -predict the downtime accordingly. +By default, dirty pages are tracked during pre-copy as well as stop-and-co= py +phase. So, a page pinned by the vendor driver will be copied to the destin= ation +in both phases. Copying dirty pages in pre-copy phase helps QEMU to predic= t if +it can achieve its downtime tolerances. If QEMU during pre-copy phase keeps +finding dirty pages continuously, then it understands that even in stop-an= d-copy +phase, it is likely to find dirty pages and can predict the downtime +accordingly. =20 QEMU also provides a per device opt-out option ``pre-copy-dirty-page-track= ing`` which disables querying the dirty bitmap during pre-copy phase. If it is s= et to @@ -111,23 +104,22 @@ Live migration save path | migrate_init spawns migration_thread Migration thread then calls each device's .save_setup() - (RUNNING, _SETUP, _RUNNING|_SAVING) + (RUNNING, _SETUP, _RUNNING) | - (RUNNING, _ACTIVE, _RUNNING|_SAVING) + (RUNNING, _ACTIVE, _RUNNING) If device is active, get pending_bytes by .save_live_pending() If total pending_bytes >=3D threshold_size, call .save_live_iter= ate() - Data of VFIO device for pre-copy phase is copied Iterate till total pending bytes converge and are less than thresh= old | On migration completion, vCPU stops and calls .save_live_complete_precop= y for - each active device. The VFIO device is then transitioned into _SAVING s= tate - (FINISH_MIGRATE, _DEVICE, _SAVING) + each active device. The VFIO device is then transitioned into _STOP_COPY= state + (FINISH_MIGRATE, _DEVICE, _STOP_COPY) | For the VFIO device, iterate in .save_live_complete_precopy until pending data is 0 - (FINISH_MIGRATE, _DEVICE, _STOPPED) + (FINISH_MIGRATE, _DEVICE, _STOP) | - (FINISH_MIGRATE, _COMPLETED, _STOPPED) + (FINISH_MIGRATE, _COMPLETED, _STOP) Migraton thread schedules cleanup bottom half and exits =20 Live migration resume path @@ -136,7 +128,7 @@ Live migration resume path :: =20 Incoming migration calls .load_setup for each device - (RESTORE_VM, _ACTIVE, _STOPPED) + (RESTORE_VM, _ACTIVE, _STOP) | For each device, .load_state is called for that device section data (RESTORE_VM, _ACTIVE, _RESUMING) --=20 2.26.3