[PATCH v4 00/14] vfio/migration: Device dirty page tracking

Joao Martins posted 14 patches 1 year, 1 month ago
There is a newer version of this series
docs/devel/vfio-migration.rst |  46 ++-
hw/vfio/common.c              | 685 ++++++++++++++++++++++++++++------
hw/vfio/migration.c           |  20 +
hw/vfio/pci.c                 |   1 +
hw/vfio/trace-events          |   2 +
include/hw/vfio/vfio-common.h |  17 +
6 files changed, 634 insertions(+), 137 deletions(-)
[PATCH v4 00/14] vfio/migration: Device dirty page tracking
Posted by Joao Martins 1 year, 1 month ago
Hey,

Presented herewith a series based on the basic VFIO migration protocol v2
implementation [1].

It is split from its parent series[5] to solely focus on device dirty
page tracking. Device dirty page tracking allows the VFIO device to
record its DMAs and report them back when needed. This is part of VFIO
migration and is used during pre-copy phase of migration to track the
RAM pages that the device has written to and mark those pages dirty, so
they can later be re-sent to target.

Device dirty page tracking uses the DMA logging uAPI to discover device
capabilities, to start and stop tracking, and to get dirty page bitmap
report. Extra details and uAPI definition can be found here [3].

Device dirty page tracking operates in VFIOContainer scope. I.e., When
dirty tracking is started, stopped or dirty page report is queried, all
devices within a VFIOContainer are iterated and for each of them device
dirty page tracking is started, stopped or dirty page report is queried,
respectively.

Device dirty page tracking is used only if all devices within a
VFIOContainer support it. Otherwise, VFIO IOMMU dirty page tracking is
used, and if that is not supported as well, memory is perpetually marked
dirty by QEMU. Note that since VFIO IOMMU dirty page tracking has no HW
support, the last two usually have the same effect of perpetually
marking all pages dirty.

Normally, when asked to start dirty tracking, all the currently DMA
mapped ranges are tracked by device dirty page tracking. If using a
vIOMMU we block live migration. It's temporary and a separate series is
going to add support for it. Thus this series focus on getting the
ground work first.

The series is organized as follows:

- Patches 1-7: Fix bugs and do some preparatory work required prior to
  adding device dirty page tracking.
- Patches 8-11: Implement device dirty page tracking.
- Patch 12: Blocks live migration with vIOMMU.
- Patches 13-14 Detect device dirty page tracking and document it.

Comments, improvements as usual appreciated.

Thanks,
	Joao

Changes from v3 [6]:
- Added Rbs in patches 4,5,6, 13,14;
 (Did not add the other because they suffered a lot of changes)
- Fix the unblocker of live migration by moving the vfio_unblock_giommu_migration
into vfio_instance_finalize()
- Refactor/Simplify the test for vIOMMU enabled
  (patch 12)
- Change the style of how we set features::flags
  (patch 9, 11)
- Return -ENOMEM in vfio_bitmap_alloc(), and change callsites to return
  ret instead of errno
  (patch 4)
- Remove iova-tree includes
- Initialize range min{32,64} to UINT{32,64}_MAX to better calculate the
  minimum range without assumptions.
- Add commentary into why we unregister the memory listener
- Add commentary about the dual-split of ranges
- Removed the mutex because the memory listener is all serialized
- Move out the vfio_section_get_iova_range() into its own patch and
  make vfio_listener_region_add() use it too.
- Add a VFIODirtyRanges struct which is allocated from the stack as
  opposed to being stored in the container and make the listener be
  registered with it.
- Remove stale paragraph from commit message
  (patch 8)
- Unroll vfio_device_dma_logging_set() to its own code in start() which
fails early and returns, and stop() which is void and we never return
early.

Changes from v2 [5]:
- Split initial dirty page tracking support from the parent series to
  split into smaller parts.
- Replace an IOVATree with a simple two range setup: one range for 32-bit
  another one for 64-bit address space. After discussions it was sorted out
  this way due to unnecessary complexity of IOVAtree while being more
  efficient too without stressing so much of the UAPI limits. (patch 7 and 8) 
- For now exclude vIOMMU, and so add a live migration blocker if a
  vIOMMU is passed in. This will be followed up with vIOMMU support in
  a separate series. (patch 10)
- Add new patches to reuse most helpers used across memory listeners.
  This is useful for reusal when recording DMA ranges.  (patch 5 and 6)
- Adjust Documentation to avoid mentioning the vIOMMU and instead
  claim that vIOMMU with device dirty page tracking is blocked. Cedric
  gave a Rb, but I've dropped taking into consideration the split and no
  vIOMMU support (patch 13)
- Improve VFIOBitmap to avoid allocating a 16byte structure to
  place it on the stack. Remove the free helper function. (patch 4)
- Fixing the compilation issues (patch 8 and 10). Possibly not 100%
  addressed as I am still working out the env to repro it.

Changes from v1 [4]:
- Rebased on latest master branch. As part of it, made some changes in
  pre-copy to adjust it to Juan's new patches:
  1. Added a new patch that passes threshold_size parameter to
     .state_pending_{estimate,exact}() handlers.
  2. Added a new patch that refactors vfio_save_block().
  3. Changed the pre-copy patch to cache and report pending pre-copy
     size in the .state_pending_estimate() handler.
- Removed unnecessary P2P code. This should be added later on when P2P
  support is added. (Alex)
- Moved the dirty sync to be after the DMA unmap in vfio_dma_unmap()
  (patch #11). (Alex)
- Stored vfio_devices_all_device_dirty_tracking()'s value in a local
  variable in vfio_get_dirty_bitmap() so it can be re-used (patch #11).
- Refactored the viommu device dirty tracking ranges creation code to
  make it clearer (patch #15).
- Changed overflow check in vfio_iommu_range_is_device_tracked() to
  emphasize that we specifically check for 2^64 wrap around (patch #15).
- Added R-bs / Acks.

[1] https://lore.kernel.org/qemu-devel/167658846945.932837.1420176491103357684.stgit@omen/
[2] https://lore.kernel.org/kvm/20221206083438.37807-3-yishaih@nvidia.com/
[3] https://lore.kernel.org/netdev/20220908183448.195262-4-yishaih@nvidia.com/
[4] https://lore.kernel.org/qemu-devel/20230126184948.10478-1-avihaih@nvidia.com/
[5] https://lore.kernel.org/qemu-devel/20230222174915.5647-1-avihaih@nvidia.com/
[6] https://lore.kernel.org/qemu-devel/20230304014343.33646-1-joao.m.martins@oracle.com/

Avihai Horon (6):
  vfio/common: Fix error reporting in vfio_get_dirty_bitmap()
  vfio/common: Fix wrong %m usages
  vfio/common: Abort migration if dirty log start/stop/sync fails
  vfio/common: Add VFIOBitmap and alloc function
  vfio/common: Extract code from vfio_get_dirty_bitmap() to new function
  docs/devel: Document VFIO device dirty page tracking

Joao Martins (8):
  vfio/common: Add helper to validate iova/end against hostwin
  vfio/common: Consolidate skip/invalid section into helper
  vfio/common: Add helper to consolidate iova/end calculation
  vfio/common: Record DMA mapped IOVA ranges
  vfio/common: Add device dirty page tracking start/stop
  vfio/common: Add device dirty page bitmap sync
  vfio/migration: Block migration with vIOMMU
  vfio/migration: Query device dirty page tracking support

 docs/devel/vfio-migration.rst |  46 ++-
 hw/vfio/common.c              | 685 ++++++++++++++++++++++++++++------
 hw/vfio/migration.c           |  20 +
 hw/vfio/pci.c                 |   1 +
 hw/vfio/trace-events          |   2 +
 include/hw/vfio/vfio-common.h |  17 +
 6 files changed, 634 insertions(+), 137 deletions(-)

-- 
2.17.2