[RFC PATCH 4/5] hw/xen: [FIXME] Avoid deadlock in xen_evtchn_set_gsi()

David Woodhouse posted 5 patches 2 years, 3 months ago
[RFC PATCH 4/5] hw/xen: [FIXME] Avoid deadlock in xen_evtchn_set_gsi()
Posted by David Woodhouse 2 years, 3 months ago
From: David Woodhouse <dwmw@amazon.co.uk>

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 hw/i386/kvm/xen_evtchn.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
index 18c88229ab..c4103ee98b 100644
--- a/hw/i386/kvm/xen_evtchn.c
+++ b/hw/i386/kvm/xen_evtchn.c
@@ -1436,6 +1436,14 @@ bool xen_evtchn_set_gsi(int gsi, int level)
         return false;
     }
 
+    /*
+     * XXX: We access this without locking. Because we'd deadlock
+     * if it was the callback_gsi. Do something cleverer.
+     */
+    if (gsi && gsi == s->callback_gsi) {
+        return false;
+    }
+
     QEMU_LOCK_GUARD(&s->port_lock);
 
     pirq = s->gsi_pirq[gsi];
-- 
2.35.3
Re: [RFC PATCH 4/5] hw/xen: [FIXME] Avoid deadlock in xen_evtchn_set_gsi()
Posted by David Woodhouse 2 years, 3 months ago
FWIW, with this support in QEMU I've just found and fixed two guest
kernel regressions in MSI-X handling under Xen. And having done that, I
can get back to frowning at the locking on the qemu side...

On Sat, 2023-01-14 at 00:39 +0000, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  hw/i386/kvm/xen_evtchn.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/hw/i386/kvm/xen_evtchn.c b/hw/i386/kvm/xen_evtchn.c
> index 18c88229ab..c4103ee98b 100644
> --- a/hw/i386/kvm/xen_evtchn.c
> +++ b/hw/i386/kvm/xen_evtchn.c
> @@ -1436,6 +1436,14 @@ bool xen_evtchn_set_gsi(int gsi, int level)
>          return false;
>      }
>  
> +    /*
> +     * XXX: We access this without locking. Because we'd deadlock
> +     * if it was the callback_gsi. Do something cleverer.
> +     */
> +    if (gsi && gsi == s->callback_gsi) {
> +        return false;
> +    }
> +
>      QEMU_LOCK_GUARD(&s->port_lock);
>  
>      pirq = s->gsi_pirq[gsi];

I tried adding a QemuRecMutex just around s->callback_gsi which kind of
works, but then I realised that I'm not supposed to be calling
qemu_set_irq() without holding the BQL, am I?

The code which does so (in xen_evtchn_set_callback_level()) sometimes
gets triggered from a vCPU making a hypercall (which doesn't hold the
BQL), and sometimes from a PV backend (which *will* hold the BQL). In
fact the latter is much more common in the case where GSI delivery is
needed.

Because it's called from the PV backends, I don't think I can avoid the
rule that the BQL is taken first, and the evtchn port_lock taken inside
it.

So I must never block on the BQL from the hypercall code paths. I think
that means I should refactor xen_evtchn_set_callback_level() to:

  • Take an argument telling it whether it is running from a context 
    with the BQL already held.
  • Do a *trylock* on the BQL if isn't already held.
  • If the trylock fails, invoke itself from a BH on the main I/O
    thread.

Does that seem reasonable?