FWIW, with this support in QEMU I've just found and fixed two guest
kernel regressions in MSI-X handling under Xen. And having done that, I
can get back to frowning at the locking on the qemu side...
On Sat, 2023-01-14 at 00:39 +0000, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> hw/i386/kvm/xen_evtchn.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
> index 18c88229ab..c4103ee98b 100644
> --- a/hw/i386/kvm/xen_evtchn.c
> +++ b/hw/i386/kvm/xen_evtchn.c
> @@ -1436,6 +1436,14 @@ bool xen_evtchn_set_gsi(int gsi, int level)
> return false;
> }
>
> + /*
> + * XXX: We access this without locking. Because we'd deadlock
> + * if it was the callback_gsi. Do something cleverer.
> + */
> + if (gsi && gsi == s->callback_gsi) {
> + return false;
> + }
> +
> QEMU_LOCK_GUARD(&s->port_lock);
>
> pirq = s->gsi_pirq[gsi];
I tried adding a QemuRecMutex just around s->callback_gsi which kind of
works, but then I realised that I'm not supposed to be calling
qemu_set_irq() without holding the BQL, am I?
The code which does so (in xen_evtchn_set_callback_level()) sometimes
gets triggered from a vCPU making a hypercall (which doesn't hold the
BQL), and sometimes from a PV backend (which *will* hold the BQL). In
fact the latter is much more common in the case where GSI delivery is
needed.
Because it's called from the PV backends, I don't think I can avoid the
rule that the BQL is taken first, and the evtchn port_lock taken inside
it.
So I must never block on the BQL from the hypercall code paths. I think
that means I should refactor xen_evtchn_set_callback_level() to:
• Take an argument telling it whether it is running from a context
with the BQL already held.
• Do a *trylock* on the BQL if isn't already held.
• If the trylock fails, invoke itself from a BH on the main I/O
thread.
Does that seem reasonable?