From nobody Wed Feb 11 07:12:43 2026 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; arc=pass (i=1 dmarc=pass fromdomain=nvidia.com); dmarc=pass(p=reject dis=none) header.from=nvidia.com ARC-Seal: i=2; a=rsa-sha256; t=1673807890; cv=pass; d=zohomail.com; s=zohoarc; b=RAWJIzbi30JOashLWms2CZ/LBqofwo4H8B1dQV8qrAoqaV+mCh+zWWgqASOZDmBVBFWc6h0t8dtbzB0HLazl6yvOXnzDs3m5X9ksWlcZEJxzAI9tFGHHk6fa9C96+EvfWir7GEbTwi2C64gKW3mIqCspfkWEpJw+oc6EnDVeLgY= ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1673807890; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=IIbiCHyN+VAgv65R/v0r1jLKNFBx84IPLz1EDxgnX0M=; b=G15VukPQ7pPRyr5gFDHI279YMFV/z6rzEehRSoVz22cBPjtUAiV0RwEik13tsfXcFvatTv2mA2H6FtMXubthJtCqOMGWgzcIPX+8FOCGnYhmOg12USNeIqDFSHP2wGKz40RJhrKJQ3zvBuU6qKxJuBq1c4XnF/sOOLttR9tMkO8= ARC-Authentication-Results: i=2; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; arc=pass (i=1 dmarc=pass fromdomain=nvidia.com); dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1673807890905239.31982562794678; Sun, 15 Jan 2023 10:38:10 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pH7t9-0000Ue-FE; Sun, 15 Jan 2023 13:37:51 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pH7sz-00082x-G1; Sun, 15 Jan 2023 13:37:43 -0500 Received: from mail-bn7nam10on2062.outbound.protection.outlook.com ([40.107.92.62] helo=NAM10-BN7-obe.outbound.protection.outlook.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pH7sv-0007mp-HR; Sun, 15 Jan 2023 13:37:41 -0500 Received: from DS7PR03CA0002.namprd03.prod.outlook.com (2603:10b6:5:3b8::7) by SJ2PR12MB8135.namprd12.prod.outlook.com (2603:10b6:a03:4f3::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.23; Sun, 15 Jan 2023 18:37:33 +0000 Received: from DM6NAM11FT062.eop-nam11.prod.protection.outlook.com (2603:10b6:5:3b8:cafe::b3) by DS7PR03CA0002.outlook.office365.com (2603:10b6:5:3b8::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.19 via Frontend Transport; Sun, 15 Jan 2023 18:37:33 +0000 Received: from mail.nvidia.com (216.228.118.232) by DM6NAM11FT062.mail.protection.outlook.com (10.13.173.40) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Sun, 15 Jan 2023 18:37:32 +0000 Received: from drhqmail201.nvidia.com (10.126.190.180) by mail.nvidia.com (10.127.129.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Sun, 15 Jan 2023 10:37:32 -0800 Received: from drhqmail202.nvidia.com (10.126.190.181) by drhqmail201.nvidia.com (10.126.190.180) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Sun, 15 Jan 2023 10:37:31 -0800 Received: from vdi.nvidia.com (10.127.8.9) by mail.nvidia.com (10.126.190.181) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Sun, 15 Jan 2023 10:37:25 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=dt9JXn0wypSlMv1T5aDqVmX+cyiJC2fIgEk/E/5jrWBZATMitkyi1WRa6A1XWIKSDX792hltYh4mqnkFsbk6jFUS61m4Vc60pbFpsXl1RosZ5PW0tz8rw69yx/AFFcmAHgOYSnfGMHzPZqhSOWDk3dp9l2eR7FC/im4oqyFXPgkFXaA/FIsnlR3aMrCK8D6Q2XQpqqybl8s5PaQCYf3/HskJQqAkxMBvjoeZ8qMRL79Gj/ko9AK7a41F/b0EYgla+JnaQgMUXJ/SKMqX5UbegDm/wp/LAQcW2LSv5oK80GWufmfCnY+4ePPR7Q0hz/yzgs8GnNNWyUnTvhjEMJy3tw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=IIbiCHyN+VAgv65R/v0r1jLKNFBx84IPLz1EDxgnX0M=; b=IZHcYZQ+u4SX3HbpcFXpiIxyPvsPjWt+GWxwwigpuqcGcTZv6Xnd3Z0lQb0mmGymhzypsG5K8dgUKgn1mir56SkwS7ZkCJMEqi7avS+tpRLur7sMeEi5H8ow90J/RwGdF+7Xu0Tljb6ddy1I2/j4M8W3c0f0miHY/V6qXITVvYmE0+rzi1Kw2Dgz7Ugb4EhxsbqogL4D91/JLMsDlRZEGZu5JvENVZxQlZf9CZR8pjgz91s+cE08my2GD8GbSpfPlmFTphyP/4CiX8gouJGHrNs1+xNhym6Vc1oKi7GobF0ozoOTgTwjGbHH0cGqWIlLCy4g41bxiHPoh5Mdn02QEQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.118.232) smtp.rcpttodomain=nongnu.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IIbiCHyN+VAgv65R/v0r1jLKNFBx84IPLz1EDxgnX0M=; b=EGxuYoYk3AHm1+xuRhaxLkQ9h51GpKBU99zV0z5kp2sDgnVObegRcS8jMqVbOPTLgGuglNd+mhrcnjNulTI/wlLyHAlmPoxT3J8bphvgd9JtfdmLifl2PlDv/EmWmvo6/jKYLXe+m/No9fUxiaPu10WpJgNbB6MZFSr7wyIV/zJuTHH8+Qq6N7r1Lh54/xdq+hgSnSqwkewOdQLqsgVuC9lCriyZJABzc2bAib0oQbmhnFQnHdnuR3WWFIWDa90fO5/zMHgW60g42zA5jH2LlDjWKF16xgFARboyZ5XbmL0LwN9JaboLuS8E3dUs6YaeJv6RnOlEacvoId5Ci1fhIQ== X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.118.232) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.118.232 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.118.232; helo=mail.nvidia.com; pr=C From: Avihai Horon To: CC: Alex Williamson , Halil Pasic , Christian Borntraeger , Eric Farman , Richard Henderson , David Hildenbrand , "Ilya Leoshkevich" , Thomas Huth , "Juan Quintela" , "Dr. David Alan Gilbert" , "Michael S. Tsirkin" , Cornelia Huck , Paolo Bonzini , Stefan Hajnoczi , Fam Zheng , Eric Blake , Vladimir Sementsov-Ogievskiy , John Snow , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= , , , Yishai Hadas , Jason Gunthorpe , Maor Gottlieb , Avihai Horon , Kirti Wankhede , Tarun Gupta , Joao Martins Subject: [PATCH v7 13/13] docs/devel: Align VFIO migration docs to v2 protocol Date: Sun, 15 Jan 2023 20:35:56 +0200 Message-ID: <20230115183556.7691-14-avihaih@nvidia.com> X-Mailer: git-send-email 2.21.3 In-Reply-To: <20230115183556.7691-1-avihaih@nvidia.com> References: <20230115183556.7691-1-avihaih@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM11FT062:EE_|SJ2PR12MB8135:EE_ X-MS-Office365-Filtering-Correlation-Id: 14a7afde-e434-430b-baff-08daf72790b2 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: B7FeqGcR6/AukXz/iwfBHQxyw6yOfdzYVMEMnpiVCn1s++LmL5aBlmNGGqf4FmOMpX1vQM3AIQNADk2MEqbu01j04rPGp2JL06gsNX4byVFy47hyQ/J5hENMPftNC+AltD71mQxTDlsHNBvFGkgSpBIBz4hSFLZOaLqaJCXyMZRDMiU5xfqyuurmlnrQKIp8y2M6l+VHtRD1EN+zCdN4B08R7Lf1Dllq0ihqx9d4uFrt7oPY6zXKoO5zzUBN0jfmapxlxAxu95wuhGVpsiT6XV97FwMzaRnJm84Qh/KlK36hVdacdUdUl1wQfunPYgqSZDMmK9fNJg1n+a01KaTpzrUhkBlSIv//8f15Rul8vLoBnRgWeYCPGBYRN/h2W2U4d1IvZWMCQ10Rrh7BlfV4JWd5jJ3tHl7mo6gcdQy/AT96jZWPtJUN72jvkbDi4jxPUm/lWURvIRGRDjpJNLCG0pRju1THfyUPdQufntiooGvfwbLxoDKyrftynazzEkbPXNno6QA79VDZmSvhWJSNvTNbT0WdxcV+2FfoxQF+F2x7Sp0EJX8dT8j5BwcLLhsua/Zzv9eeTBWnMYkeRhO2AkwJ8RoZxoxWjEKyZACbhkuaNRfrX7sLmb02iX0BfBIq6GFjRgKANORpoxArR07E7hFYmM0kdRtk4E/UV6e77oHZCB4yOHtQQ9Aua8uf49ldKXLwjFJ5y/wUqUnC8kJehA== X-Forefront-Antispam-Report: CIP:216.228.118.232; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc7edge1.nvidia.com; CAT:NONE; SFS:(13230022)(4636009)(376002)(396003)(39860400002)(136003)(346002)(451199015)(40470700004)(46966006)(36840700001)(66574015)(54906003)(36756003)(316002)(82740400003)(478600001)(86362001)(7696005)(70586007)(4326008)(7636003)(70206006)(6916009)(8676002)(40480700001)(83380400001)(426003)(336012)(2616005)(40460700003)(82310400005)(356005)(47076005)(6666004)(1076003)(186003)(26005)(36860700001)(7416002)(5660300002)(8936002)(41300700001)(2906002); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Jan 2023 18:37:32.8357 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 14a7afde-e434-430b-baff-08daf72790b2 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.118.232]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT062.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ2PR12MB8135 Received-SPF: softfail client-ip=40.107.92.62; envelope-from=avihaih@nvidia.com; helo=NAM10-BN7-obe.outbound.protection.outlook.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @Nvidia.com) X-ZM-MESSAGEID: 1673807892869100006 Now that VFIO migration protocol v2 has been implemented and v1 protocol has been removed, update the documentation according to v2 protocol. Signed-off-by: Avihai Horon Reviewed-by: C=C3=A9dric Le Goater --- docs/devel/vfio-migration.rst | 68 ++++++++++++++++------------------- 1 file changed, 30 insertions(+), 38 deletions(-) diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst index 9ff6163c88..1d50c2fe5f 100644 --- a/docs/devel/vfio-migration.rst +++ b/docs/devel/vfio-migration.rst @@ -7,46 +7,39 @@ the guest is running on source host and restoring this sa= ved state on the destination host. This document details how saving and restoring of VFIO devices is done in QEMU. =20 -Migration of VFIO devices consists of two phases: the optional pre-copy ph= ase, -and the stop-and-copy phase. The pre-copy phase is iterative and allows to -accommodate VFIO devices that have a large amount of data that needs to be -transferred. The iterative pre-copy phase of migration allows for the gues= t to -continue whilst the VFIO device state is transferred to the destination, t= his -helps to reduce the total downtime of the VM. VFIO devices can choose to s= kip -the pre-copy phase of migration by returning pending_bytes as zero during = the -pre-copy phase. +Migration of VFIO devices currently consists of a single stop-and-copy pha= se. +During the stop-and-copy phase the guest is stopped and the entire VFIO de= vice +data is transferred to the destination. + +The pre-copy phase of migration is currently not supported for VFIO device= s. +Support for VFIO pre-copy will be added later on. =20 A detailed description of the UAPI for VFIO device migration can be found = in -the comment for the ``vfio_device_migration_info`` structure in the header -file linux-headers/linux/vfio.h. +the comment for the ``vfio_device_mig_state`` structure in the header file +linux-headers/linux/vfio.h. =20 VFIO implements the device hooks for the iterative approach as follows: =20 -* A ``save_setup`` function that sets up the migration region and sets _SA= VING - flag in the VFIO device state. +* A ``save_setup`` function that sets up migration on the source. =20 -* A ``load_setup`` function that sets up the migration region on the - destination and sets _RESUMING flag in the VFIO device state. +* A ``load_setup`` function that sets the VFIO device on the destination in + _RESUMING state. =20 * A ``save_live_pending`` function that reads pending_bytes from the vendor driver, which indicates the amount of data that the vendor driver has ye= t to save for the VFIO device. =20 -* A ``save_live_iterate`` function that reads the VFIO device's data from = the - vendor driver through the migration region during iterative phase. - * A ``save_state`` function to save the device config space if it is prese= nt. =20 -* A ``save_live_complete_precopy`` function that resets _RUNNING flag from= the - VFIO device state and iteratively copies the remaining data for the VFIO - device until the vendor driver indicates that no data remains (pending b= ytes - is zero). +* A ``save_live_complete_precopy`` function that sets the VFIO device in + _STOP_COPY state and iteratively copies the data for the VFIO device unt= il + the vendor driver indicates that no data remains. =20 * A ``load_state`` function that loads the config section and the data - sections that are generated by the save functions above + sections that are generated by the save functions above. =20 * ``cleanup`` functions for both save and load that perform any migration - related cleanup, including unmapping the migration region + related cleanup. =20 =20 The VFIO migration code uses a VM state change handler to change the VFIO @@ -71,13 +64,13 @@ tracking can identify dirtied pages, but any page pinne= d by the vendor driver can also be written by the device. There is currently no device or IOMMU support for dirty page tracking in hardware. =20 -By default, dirty pages are tracked when the device is in pre-copy as well= as -stop-and-copy phase. So, a page pinned by the vendor driver will be copied= to -the destination in both phases. Copying dirty pages in pre-copy phase helps -QEMU to predict if it can achieve its downtime tolerances. If QEMU during -pre-copy phase keeps finding dirty pages continuously, then it understands -that even in stop-and-copy phase, it is likely to find dirty pages and can -predict the downtime accordingly. +By default, dirty pages are tracked during pre-copy as well as stop-and-co= py +phase. So, a page pinned by the vendor driver will be copied to the destin= ation +in both phases. Copying dirty pages in pre-copy phase helps QEMU to predic= t if +it can achieve its downtime tolerances. If QEMU during pre-copy phase keeps +finding dirty pages continuously, then it understands that even in stop-an= d-copy +phase, it is likely to find dirty pages and can predict the downtime +accordingly. =20 QEMU also provides a per device opt-out option ``pre-copy-dirty-page-track= ing`` which disables querying the dirty bitmap during pre-copy phase. If it is s= et to @@ -111,23 +104,22 @@ Live migration save path | migrate_init spawns migration_thread Migration thread then calls each device's .save_setup() - (RUNNING, _SETUP, _RUNNING|_SAVING) + (RUNNING, _SETUP, _RUNNING) | - (RUNNING, _ACTIVE, _RUNNING|_SAVING) + (RUNNING, _ACTIVE, _RUNNING) If device is active, get pending_bytes by .save_live_pending() If total pending_bytes >=3D threshold_size, call .save_live_iter= ate() - Data of VFIO device for pre-copy phase is copied Iterate till total pending bytes converge and are less than thresh= old | On migration completion, vCPU stops and calls .save_live_complete_precop= y for - each active device. The VFIO device is then transitioned into _SAVING s= tate - (FINISH_MIGRATE, _DEVICE, _SAVING) + each active device. The VFIO device is then transitioned into _STOP_COPY= state + (FINISH_MIGRATE, _DEVICE, _STOP_COPY) | For the VFIO device, iterate in .save_live_complete_precopy until pending data is 0 - (FINISH_MIGRATE, _DEVICE, _STOPPED) + (FINISH_MIGRATE, _DEVICE, _STOP) | - (FINISH_MIGRATE, _COMPLETED, _STOPPED) + (FINISH_MIGRATE, _COMPLETED, _STOP) Migraton thread schedules cleanup bottom half and exits =20 Live migration resume path @@ -136,7 +128,7 @@ Live migration resume path :: =20 Incoming migration calls .load_setup for each device - (RESTORE_VM, _ACTIVE, _STOPPED) + (RESTORE_VM, _ACTIVE, _STOP) | For each device, .load_state is called for that device section data (RESTORE_VM, _ACTIVE, _RESUMING) --=20 2.26.3