[v3] hvf x86 correctness and efficiency improvements

[PATCH v3 0/7] hvf x86 correctness and efficiency improvements

Posted by Phil Dennis-Jordan 1 year ago

This is a series of semi-related patches for the x86 macOS Hypervisor.framework
(hvf) accelerator backend. The intention is to (1) fix bugs and (2) move the
hvf backend to use more modern and efficient APIs in Hypervisor.framework.

The goal is to replace the main hv_vcpu_run() call with hv_vcpu_run_until().
On the one hand, doing so noticeably improves performance in itself. On the
other hand, using the new API is a prerequisite for enabling hvf's in-kernel
virtual APIC implementation, which provides a further, even more drastic
performance improvement on many workloads. Integrating the APIC requires
a bunch of large commits which still need some compatibility bugfixing, and
which I hope to submit as a later patch set.

Compared to v2 of the patch set, I've re-added the kick and hv_vcpu_run_until
patches after analysing hv_vcpu_interrupt in more detail and finding it safe.
Plus, there's an ergonomic improvement to the assert_hvf_ok macro.

In this series:

Patch 1 enables the INVTSC CPUID bit when running with hvf. This can enable
some optimisations in the guest OS, and I've not found any reason it shouldn't
be allowed for hvf based hosts. It now also includes setting a migration
blocker when the feature is active.

Patch 2 fixes a bunch of compile warnings that kept littering my build logs,
so I finally caved and fixed them. As far as I can tell, these were all
ancient typos.

Patch 3 sorts out the minor mess of hvf vCPU IDs/handles. The aarch64 and
x86-64 versions of Hypervisor.framework's APIs use different integral types
(uint64_t vs unsigned int) when referencing vCPUs, so this changes the code
to use the correct one depending on build arch instead of awkward pointer
casts. (There's currently only one instance of such a cast, but patches
5 and 6 would have added more, so I'm fixing this preemptively.)

Patch 4 fixes dirty page tracking for the x86 hvf backend. This has
previously only accidentally worked because hv_vcpu_run() makes spurious EPT
fault exits. Switching to hv_vcpu_run_until() surfaces this issue when using
an emulated VGA adapter for example, as those use dirty page tracking to do
partial screen updates. This issue was causing problems on previous attempts
of switching to the newer HVF APIs as it was masked by the inefficiency of
the older APIs.

Patch 5 implements a "real" vCPU kick using hv_vcpu_interrupt(). So far, the
x86 hvf backend's external interrupt injection has relied on hv_vcpu_run()'s
behaviour of exiting the VM very frequently without host action to handle
external interrupts. This is not a great state of affairs in itself, but
fails entirely when switching to hv_vcpu_run_until() which returns only when
there is actual work to be done by the VMM.
In previous attempts to introduce this improved 'kick', there has been doubt
about hv_vcpu_interrupt()'s reliability in edge cases, particularly when an
interrupt is issued when the vCPU thread is either not running
hv_vcpu_run_until() at all, or either entering or exiting VMX mode.
I believe this concern is unfounded for three reasons:
1) The patches have been in use in a fleet of hundreds of production systems
running CI workloads for over two years. No symptoms of a missed interrupt
(or indeed any other issues) have been encountered.
2) I have tried to provoke such an edge case and failed. To do this, I
hacked up Michael Steil's toy "hvdos" VMM to run some bare-metal assembly
code, then hit the running VM with a barrage of hv_vcpu_interrupt calls
at randomly spaced intervals. Any hv_vcpu_interrupt call is followed shortly
by a VM exit, regardless of what state the vCPU thread was in. Multiple
interrupts in short succession are coalesced, but only before a VM exit. If
an interrupt is issued after the VM has already exited, a further exit is
triggered as soon as the vCPU thread attempts to re-enter the VM.
The code for this is here: https://gitlab.com/pmdj/hvf-edge-cases
3) The doubts about hv_vcpu_interrupt edge cases seem to originate in
observed behaviour that was actually caused by the dirty memory tracking
bug fixed in patch 4 of this series. That bug had nothing to do with
hv_vcpu_interrupt as such, it was surfaced by hv_vcpu_run_until()'s change
in EPT fault exit behaviour compared to hv_vcpu_run().
Of course, this is not PROOF of absence of defects, but I'm not aware of
any formal proofs covering other Qemu code or dependencies. I have also
asked Apple's DTS to officially clarify hv_vcpu_interrupt()'s behaviour in
edge cases but unfortunately received no reply. 

Patch 6 switches from hv_vcpu_run() to hv_vcpu_run_until() where supported.
This was of course the goal, and the previous patches fix all the bugs that
caused this patch to surface them.

Patch 7 provides a small improvement to error messages if and when an hvf
call fails. assert_hvf_ok() would previously only tell you the error code
of the failing call, not which call was failing, nor the call site.
The change makes it behave more like a classic assertion, reporting the
expression as well as the source code location along with the error.

changelog:
v3:
 - Back to one patch series with all the changes.
 - Detailed investigation into edge case behaviour of hv_vcpu_interrupt.
   Determined it was behaving fine, no race condition workarounds needed,
   so the kick and hv_vcpu_run_until patches have actually stayed essentially
   the same as in v1.
 - Added the assert_hvf_ok patch because I kept using it when debugging.

v2:
 - Migration blocker when INVTSC is set (Thanks Paolo for pointing that out!)
 - Dirty page tracking fix (Thanks Roman for noticing the regression in
   observed behaviour on certain VMs, which led me to debugging this issue.)
 - vCPU handle type cleanup (Based on discussion with Paolo)
 - Added fixes for existing compile warnings.
 - Split patch series into 2 parts.

This work has been sponsored by Sauce Labs Inc.

Phil Dennis-Jordan (7):
  i386/hvf: Adds support for INVTSC cpuid bit
  i386/hvf: Fixes some compilation warnings
  hvf: Consistent types for vCPU handles
  i386/hvf: Fixes dirty memory tracking by page granularity RX->RWX
    change
  i386/hvf: In kick_vcpu use hv_vcpu_interrupt to force exit
  i386/hvf: Updates API usage to use modern vCPU run function
  hvf: Makes assert_hvf_ok report failed expression

 accel/hvf/hvf-accel-ops.c    |  2 +-
 accel/hvf/hvf-all.c          | 49 ++++++++++++++++--------------------
 include/sysemu/hvf_int.h     |  9 +++++--
 target/i386/hvf/hvf.c        | 47 +++++++++++++++++++++++++++++++---
 target/i386/hvf/vmx.h        |  3 +--
 target/i386/hvf/x86_cpuid.c  |  4 +++
 target/i386/hvf/x86_decode.c |  2 +-
 target/i386/hvf/x86_emu.c    |  4 +--
 8 files changed, 81 insertions(+), 39 deletions(-)

-- 
2.36.1

Re: [PATCH v3 0/7] hvf x86 correctness and efficiency improvements

Posted by Paolo Bonzini 1 year ago

Queued, thanks.

Thanks for persisting!  It sucks that the hv_vcpu_interrupt() API docs
are not clear, but your tests are great.  The self-interrupt one is
the case that I was most worried about, and you're covering it.
Sorry for being a pain for nothing, at least retrospectively.

Paolo

Re: [PATCH v3 0/7] hvf x86 correctness and efficiency improvements

Posted by Phil Dennis-Jordan 1 year ago

On Thu, 6 Jun 2024 at 10:24, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Queued, thanks.

Thanks - also for reviewing, etc.!

> Thanks for persisting!  It sucks that the hv_vcpu_interrupt() API docs
> are not clear, but your tests are great.  The self-interrupt one is
> the case that I was most worried about, and you're covering it.
> Sorry for being a pain for nothing, at least retrospectively.

No worries - the concern is understandable, especially in the face of
the unfortunate apparent regression which turned out to be the dirty
page tracking bug.

And I agree, the hv_vcpu_interrupt docs, along with the rest of
Hypervisor.framework's, are terrible. There does not appear to have
been any thought about what a developer using that API might care
about. I've been working on integrating the HVF APIC/PIC/IOAPIC
implementations, and there are ambiguities and edge cases galore.
Unfortunately (?), the perf improvement is worth the trouble of trial
& error…

Phil