From nobody Mon Sep 16 18:50:13 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; arc=pass (i=1 dmarc=pass fromdomain=nvidia.com); dmarc=pass(p=reject dis=none) header.from=nvidia.com ARC-Seal: i=2; a=rsa-sha256; t=1673878488; cv=pass; d=zohomail.com; s=zohoarc; b=g0Ok241BP/RQJ6g+3EgQApEyZrQzVcXpouSIDzPY6OWzMAnsdaJBVpeh0gUHiCUez5U3ziU+lT1bzmk9Na+V6tj5gdfc7Nc1w9ogohdAqoAd0E2edIKb72KoJIAQmCQVVcFlFN7fEjTcfT7IlMHqemV/qSmD7uqPB9Q8+J02RVw= ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1673878488; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=IIbiCHyN+VAgv65R/v0r1jLKNFBx84IPLz1EDxgnX0M=; b=bD4B+rOsgNyNIHk7JYEVOTFTG8Zk4x+zMBSaRwQJiN3CXlsIJkabtxpaD/EBSL8R7k4+0wuq1s09HSZQuwrPhGWqVNFvbbtk1QOcYpE0ax5QF0fDVDkQC3uPSu4ldoIYg2nFK2ZB1wTjvHdSrDkE31XABAMGzh2qn7JHl2O+1X0= ARC-Authentication-Results: i=2; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; arc=pass (i=1 dmarc=pass fromdomain=nvidia.com); dmarc=pass header.from= (p=reject dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1673878488104829.7878871987323; Mon, 16 Jan 2023 06:14:48 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pHQF6-0004c8-Vp; Mon, 16 Jan 2023 09:13:45 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pHQF1-0004St-KA; Mon, 16 Jan 2023 09:13:40 -0500 Received: from mail-mw2nam10on2074.outbound.protection.outlook.com ([40.107.94.74] helo=NAM10-MW2-obe.outbound.protection.outlook.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pHQEz-0002W5-B8; Mon, 16 Jan 2023 09:13:39 -0500 Received: from DS7P222CA0027.NAMP222.PROD.OUTLOOK.COM (2603:10b6:8:2e::35) by SN7PR12MB6791.namprd12.prod.outlook.com (2603:10b6:806:268::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5986.18; Mon, 16 Jan 2023 14:13:33 +0000 Received: from DM6NAM11FT114.eop-nam11.prod.protection.outlook.com (2603:10b6:8:2e:cafe::18) by DS7P222CA0027.outlook.office365.com (2603:10b6:8:2e::35) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.19 via Frontend Transport; Mon, 16 Jan 2023 14:13:33 +0000 Received: from mail.nvidia.com (216.228.117.160) by DM6NAM11FT114.mail.protection.outlook.com (10.13.172.206) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6002.13 via Frontend Transport; Mon, 16 Jan 2023 14:13:33 +0000 Received: from rnnvmail204.nvidia.com (10.129.68.6) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 16 Jan 2023 06:13:19 -0800 Received: from rnnvmail201.nvidia.com (10.129.68.8) by rnnvmail204.nvidia.com (10.129.68.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.36; Mon, 16 Jan 2023 06:13:19 -0800 Received: from vdi.nvidia.com (10.127.8.9) by mail.nvidia.com (10.129.68.8) with Microsoft SMTP Server id 15.2.986.36 via Frontend Transport; Mon, 16 Jan 2023 06:13:12 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=kNekGEOrJg3O72vrpHtG5s55ckaNg2jF19qKheju+0+uFFuAabDDk9NE+hKL5KYHxZlQ1+dz1zH2Ez28c9vP2BYkXo9a77WzG68/QJH67QVm/0AF1lgDUWtXUQRB3beZB6P5MNLAl/Kf/+8EzFGjln2cyeBeSO8MSpndJsoLDQL1rJd4DDI/30LjwxVByttCPpAU6+cJ5QT0I01yWFubskXL6iGeXterBj+fZ69n3hWU5YQXBR8hd796JFWrGFBHIiNhEun+zSCTVVkIWmRybfhZJ0SWowoCLd4szPtXfT7NlPLRds1m6sjR6c3ccGWmNGTo53uKbhULvcddxArcjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=IIbiCHyN+VAgv65R/v0r1jLKNFBx84IPLz1EDxgnX0M=; b=itJippBkwQe46qunNZgAtaXRtiMVXqoDNiJ3rqVgckFj2gqkO3XNA+wTYbpILca9M2xw0zVQRUiU1Iq5xTD03Cu1lnF/lOaP6odNaugR1GVlKiiHCNkbu8fmYcHUIzW2bUApcPAUtoOT/IMFHNtncbC5p0FlzA9/Y1PBrmZnxOwvL1tEG8XoOvJdxyuBJOMH39kHIfCguHqXP2nXBg8M7N1SMKyBVAc1YozmwiLheLyTZjPq0eoXwUE9+XGQ3octT40DBpCU9LhX8KoWKvSxFJFU50Tu4gyRpoxsy6soPPYZStnDW5KaoNzb9hhuXPjsPjJD/XzMJ4qBC2+iL1PELg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=nongnu.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IIbiCHyN+VAgv65R/v0r1jLKNFBx84IPLz1EDxgnX0M=; b=Ew1XYCNe25PoXijYwRXlBH1m4aZgjZnbivbBBsQcpym8zc28pj+5lW3ykgfcrNGLwfvCwxbmUEbT1wdUuJoU5fmSSnu12JgGpAhBfUmMaHfqmlQmeQGQXYd1j3JyJG1TA/IwVsYpeHJgVreBkxXYltSvil6upZPHEeJkvAIGDamuYW6asADoMl/6xWRoXxnPWMRznX+OJoU4n4AiTE561qz2oCmuqcsQABlddM7WN53i+bRzHEznXuFguyJg7C04P1HA7pVsS/7Qhaw+XKsJiTr9CpogxcT5RitL/V6YbKhNZxm/G+NJ9XDxVSpjod4q9M4idUP6WJTUmYGQhdEc7g== X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C From: Avihai Horon To: CC: Alex Williamson , Halil Pasic , Christian Borntraeger , Eric Farman , Richard Henderson , David Hildenbrand , "Ilya Leoshkevich" , Thomas Huth , "Juan Quintela" , "Dr. David Alan Gilbert" , "Michael S. Tsirkin" , Cornelia Huck , Paolo Bonzini , Stefan Hajnoczi , Fam Zheng , Eric Blake , Vladimir Sementsov-Ogievskiy , John Snow , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= , , , Yishai Hadas , Jason Gunthorpe , Maor Gottlieb , Avihai Horon , Kirti Wankhede , Tarun Gupta , Joao Martins Subject: [PATCH v8 13/13] docs/devel: Align VFIO migration docs to v2 protocol Date: Mon, 16 Jan 2023 16:11:35 +0200 Message-ID: <20230116141135.12021-14-avihaih@nvidia.com> X-Mailer: git-send-email 2.21.3 In-Reply-To: <20230116141135.12021-1-avihaih@nvidia.com> References: <20230116141135.12021-1-avihaih@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DM6NAM11FT114:EE_|SN7PR12MB6791:EE_ X-MS-Office365-Filtering-Correlation-Id: 0d4cf001-430b-4101-7f2b-08daf7cbd9fd X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: wf59rlqwcwivTMayx98DLKXlJ3+wk0vC1RAAICBb+6SgJhYoasYpbZ8eHh2dFi7BAE67GxT0jj2ufXvI1IVTiXD7wou0+0kd4nn5hUp93mcBDu4pGrHV7SU1P/hUPDGFhNF9mwQoiEhk9p+uH8rnuVtDC2z63vC/cbnzP29L1KAzSx+5tatcbRbW99Wiw7OKYET0u3aUKP4HQqjuaBJWCbIM1ETjjPbTwK9VU8KTEs8jcTNbCg7DgmNCN45bR0O7Z0/8mBV8Z5tjBXTRIyWZlFulOqK08zrQGnKGt+ZJ06FndHMNihNfDaUnP76+dV4No2mEu/lX4mqUjVAWMUk9Xch2RNlifITDP6FliFqjlKN/n2XJiJtkcyWJ+1s6ZnwvIBtuxRC5ITuy6z1xkH6kIkYAirxkuO2qP8Dh/ytt1O94wQM5DT+BZ+M4kpyHYn1NY/ZX0Gd32sIibk7iMjvXU7dHcrQ1OT7dI61h3lwWsFtmhLGSzrju2ydSGiFaaGCmEQmMVER+aV5eBxkrxIkwNcNhl5QzjYL8i9yx1EqXusAPlB++RwXWB7HtZXUc3bzkPhGgpyG8q8phPGTfpukvFgIojv72a1DTheFdKXwToyrASrDRGw0O+S1XfONEgjDGjxbiMjehsaSeuCp78EaSHD4pou2W7JwEHFPzIeUnq+oweO48/s5ByAF2P+DftiKfmderkJBD1nCyZe5Tsr/0fg== X-Forefront-Antispam-Report: CIP:216.228.117.160; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge1.nvidia.com; CAT:NONE; SFS:(13230022)(4636009)(346002)(376002)(39860400002)(136003)(396003)(451199015)(46966006)(40470700004)(36840700001)(2616005)(70206006)(36756003)(7696005)(40480700001)(70586007)(316002)(6666004)(86362001)(478600001)(40460700003)(8676002)(7636003)(5660300002)(356005)(41300700001)(7416002)(4326008)(6916009)(54906003)(8936002)(82740400003)(66574015)(47076005)(336012)(426003)(36860700001)(186003)(26005)(83380400001)(1076003)(82310400005)(2906002); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Jan 2023 14:13:33.2474 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0d4cf001-430b-4101-7f2b-08daf7cbd9fd X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.160]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT114.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB6791 Received-SPF: softfail client-ip=40.107.94.74; envelope-from=avihaih@nvidia.com; helo=NAM10-MW2-obe.outbound.protection.outlook.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @Nvidia.com) X-ZM-MESSAGEID: 1673878490012100007 Now that VFIO migration protocol v2 has been implemented and v1 protocol has been removed, update the documentation according to v2 protocol. Signed-off-by: Avihai Horon Reviewed-by: C=C3=A9dric Le Goater --- docs/devel/vfio-migration.rst | 68 ++++++++++++++++------------------- 1 file changed, 30 insertions(+), 38 deletions(-) diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst index 9ff6163c88..1d50c2fe5f 100644 --- a/docs/devel/vfio-migration.rst +++ b/docs/devel/vfio-migration.rst @@ -7,46 +7,39 @@ the guest is running on source host and restoring this sa= ved state on the destination host. This document details how saving and restoring of VFIO devices is done in QEMU. =20 -Migration of VFIO devices consists of two phases: the optional pre-copy ph= ase, -and the stop-and-copy phase. The pre-copy phase is iterative and allows to -accommodate VFIO devices that have a large amount of data that needs to be -transferred. The iterative pre-copy phase of migration allows for the gues= t to -continue whilst the VFIO device state is transferred to the destination, t= his -helps to reduce the total downtime of the VM. VFIO devices can choose to s= kip -the pre-copy phase of migration by returning pending_bytes as zero during = the -pre-copy phase. +Migration of VFIO devices currently consists of a single stop-and-copy pha= se. +During the stop-and-copy phase the guest is stopped and the entire VFIO de= vice +data is transferred to the destination. + +The pre-copy phase of migration is currently not supported for VFIO device= s. +Support for VFIO pre-copy will be added later on. =20 A detailed description of the UAPI for VFIO device migration can be found = in -the comment for the ``vfio_device_migration_info`` structure in the header -file linux-headers/linux/vfio.h. +the comment for the ``vfio_device_mig_state`` structure in the header file +linux-headers/linux/vfio.h. =20 VFIO implements the device hooks for the iterative approach as follows: =20 -* A ``save_setup`` function that sets up the migration region and sets _SA= VING - flag in the VFIO device state. +* A ``save_setup`` function that sets up migration on the source. =20 -* A ``load_setup`` function that sets up the migration region on the - destination and sets _RESUMING flag in the VFIO device state. +* A ``load_setup`` function that sets the VFIO device on the destination in + _RESUMING state. =20 * A ``save_live_pending`` function that reads pending_bytes from the vendor driver, which indicates the amount of data that the vendor driver has ye= t to save for the VFIO device. =20 -* A ``save_live_iterate`` function that reads the VFIO device's data from = the - vendor driver through the migration region during iterative phase. - * A ``save_state`` function to save the device config space if it is prese= nt. =20 -* A ``save_live_complete_precopy`` function that resets _RUNNING flag from= the - VFIO device state and iteratively copies the remaining data for the VFIO - device until the vendor driver indicates that no data remains (pending b= ytes - is zero). +* A ``save_live_complete_precopy`` function that sets the VFIO device in + _STOP_COPY state and iteratively copies the data for the VFIO device unt= il + the vendor driver indicates that no data remains. =20 * A ``load_state`` function that loads the config section and the data - sections that are generated by the save functions above + sections that are generated by the save functions above. =20 * ``cleanup`` functions for both save and load that perform any migration - related cleanup, including unmapping the migration region + related cleanup. =20 =20 The VFIO migration code uses a VM state change handler to change the VFIO @@ -71,13 +64,13 @@ tracking can identify dirtied pages, but any page pinne= d by the vendor driver can also be written by the device. There is currently no device or IOMMU support for dirty page tracking in hardware. =20 -By default, dirty pages are tracked when the device is in pre-copy as well= as -stop-and-copy phase. So, a page pinned by the vendor driver will be copied= to -the destination in both phases. Copying dirty pages in pre-copy phase helps -QEMU to predict if it can achieve its downtime tolerances. If QEMU during -pre-copy phase keeps finding dirty pages continuously, then it understands -that even in stop-and-copy phase, it is likely to find dirty pages and can -predict the downtime accordingly. +By default, dirty pages are tracked during pre-copy as well as stop-and-co= py +phase. So, a page pinned by the vendor driver will be copied to the destin= ation +in both phases. Copying dirty pages in pre-copy phase helps QEMU to predic= t if +it can achieve its downtime tolerances. If QEMU during pre-copy phase keeps +finding dirty pages continuously, then it understands that even in stop-an= d-copy +phase, it is likely to find dirty pages and can predict the downtime +accordingly. =20 QEMU also provides a per device opt-out option ``pre-copy-dirty-page-track= ing`` which disables querying the dirty bitmap during pre-copy phase. If it is s= et to @@ -111,23 +104,22 @@ Live migration save path | migrate_init spawns migration_thread Migration thread then calls each device's .save_setup() - (RUNNING, _SETUP, _RUNNING|_SAVING) + (RUNNING, _SETUP, _RUNNING) | - (RUNNING, _ACTIVE, _RUNNING|_SAVING) + (RUNNING, _ACTIVE, _RUNNING) If device is active, get pending_bytes by .save_live_pending() If total pending_bytes >=3D threshold_size, call .save_live_iter= ate() - Data of VFIO device for pre-copy phase is copied Iterate till total pending bytes converge and are less than thresh= old | On migration completion, vCPU stops and calls .save_live_complete_precop= y for - each active device. The VFIO device is then transitioned into _SAVING s= tate - (FINISH_MIGRATE, _DEVICE, _SAVING) + each active device. The VFIO device is then transitioned into _STOP_COPY= state + (FINISH_MIGRATE, _DEVICE, _STOP_COPY) | For the VFIO device, iterate in .save_live_complete_precopy until pending data is 0 - (FINISH_MIGRATE, _DEVICE, _STOPPED) + (FINISH_MIGRATE, _DEVICE, _STOP) | - (FINISH_MIGRATE, _COMPLETED, _STOPPED) + (FINISH_MIGRATE, _COMPLETED, _STOP) Migraton thread schedules cleanup bottom half and exits =20 Live migration resume path @@ -136,7 +128,7 @@ Live migration resume path :: =20 Incoming migration calls .load_setup for each device - (RESTORE_VM, _ACTIVE, _STOPPED) + (RESTORE_VM, _ACTIVE, _STOP) | For each device, .load_state is called for that device section data (RESTORE_VM, _ACTIVE, _RESUMING) --=20 2.26.3