[RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ

Eugenio Pérez posted 21 patches 1 year, 11 months ago
qapi/net.json                                |  13 +-
hw/virtio/vhost-iova-tree.h                  |   7 +-
hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
include/hw/virtio/vhost-vdpa.h               |   3 +
include/hw/virtio/vhost.h                    |   3 +
include/hw/virtio/virtio-net.h               |   4 +
include/hw/virtio/virtio.h                   |   1 +
include/standard-headers/linux/vhost_types.h |  11 +-
linux-headers/linux/vhost.h                  |  25 +-
hw/net/vhost_net.c                           |   5 +-
hw/net/virtio-net.c                          |  84 +++--
hw/virtio/vhost-iova-tree.c                  |  35 +-
hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
hw/virtio/virtio.c                           |   2 +-
net/vhost-vdpa.c                             | 294 ++++++++++++++-
hw/virtio/trace-events                       |  10 +-
17 files changed, 1012 insertions(+), 130 deletions(-)
[RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
Posted by Eugenio Pérez 1 year, 11 months ago
Control virtqueue is used by networking device for accepting various
commands from the driver. It's a must to support multiqueue and other
configurations.

Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
states, effectively intercepting them so qemu can track what regions of memory
are dirty because device action and needs migration. However, this does not
solve networking device state seen by the driver because CVQ messages, like
changes on MAC addresses from the driver.

To solve that, this series uses SVQ infraestructure proposed to intercept
networking control messages used by the device. This way, qemu is able to
update VirtIONet device model and to migrate it.

However, to intercept all queues would slow device data forwarding. To solve
that, only the CVQ must be intercepted all the time. This is achieved using
the ASID infraestructure, that allows different translations for different
virtqueues. The most updated kernel part of ASID is proposed at [1].

You can run qemu in two modes after applying this series: only intercepting
cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:

-netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on

First three patches enable the update of the virtio-net device model for each
CVQ message acknoledged by the device.

Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
This allows simplyfing the memory mapping, instead of map all the guest's
memory like in the data virtqueues.

Patch 10 allows to inject control messages to the device. This allows to set
state to the device both at QEMU startup and at live migration destination. In
the future, this may also be used to emulate _F_ANNOUNCE.

Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
because they are still not accepted in the kernel.

Patches 12-16 enables the set of the features of the net device model to the
vdpa device at device start.

Last ones enables the sepparated ASID and SVQ.

Comments are welcomed.

TODO:
* Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
  reason, blocking migration. This is tricky, since it can cause that the VM
  cannot be migrated anymore, so some way of block it must be used.
* Review failure paths, some are with TODO notes, other don't.

Changes from rfc v7:
* Don't map all guest space in ASID 1 but copy all the buffers. No need for
  more memory listeners.
* Move net backend start callback to SVQ.
* Wait for device CVQ commands used by the device at SVQ start, avoiding races.
* Changed ioctls, but they're provisional anyway.
* Reorder commits so refactor and code adding ones are closer to usage.
* Usual cleaning: better tracing, doc, patches messages, ...

Changes from rfc v6:
* Fix bad iotlb updates order when batching was enabled
* Add reference counting to iova_tree so cleaning is simpler.

Changes from rfc v5:
* Fixes bad calculus of cvq end group when MQ is not acked by the guest.

Changes from rfc v4:
* Add missing tracing
* Add multiqueue support
* Use already sent version for replacing g_memdup
* Care with memory management

Changes from rfc v3:
* Fix bad returning of descriptors to SVQ list.

Changes from rfc v2:
* Fix use-after-free.

Changes from rfc v1:
* Rebase to latest master.
* Configure ASID instead of assuming cvq asid != data vqs asid.
* Update device model so (MAC) state can be migrated too.

[1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/

Eugenio Pérez (21):
  virtio-net: Expose ctrl virtqueue logic
  vhost: Add custom used buffer callback
  vdpa: control virtqueue support on shadow virtqueue
  virtio: Make virtqueue_alloc_element non-static
  vhost: Add vhost_iova_tree_find
  vdpa: Add map/unmap operation callback to SVQ
  vhost: move descriptor translation to vhost_svq_vring_write_descs
  vhost: Add SVQElement
  vhost: Add svq copy desc mode
  vhost: Add vhost_svq_inject
  vhost: Update kernel headers
  vdpa: delay set_vring_ready after DRIVER_OK
  vhost: Add ShadowVirtQueueStart operation
  vhost: Make possible to check for device exclusive vq group
  vhost: add vhost_svq_poll
  vdpa: Add vhost_vdpa_start_control_svq
  vdpa: Add asid attribute to vdpa device
  vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
  vhost: Add reference counting to vhost_iova_tree
  vdpa: Add x-svq to NetdevVhostVDPAOptions
  vdpa: Add x-cvq-svq

 qapi/net.json                                |  13 +-
 hw/virtio/vhost-iova-tree.h                  |   7 +-
 hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
 include/hw/virtio/vhost-vdpa.h               |   3 +
 include/hw/virtio/vhost.h                    |   3 +
 include/hw/virtio/virtio-net.h               |   4 +
 include/hw/virtio/virtio.h                   |   1 +
 include/standard-headers/linux/vhost_types.h |  11 +-
 linux-headers/linux/vhost.h                  |  25 +-
 hw/net/vhost_net.c                           |   5 +-
 hw/net/virtio-net.c                          |  84 +++--
 hw/virtio/vhost-iova-tree.c                  |  35 +-
 hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
 hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
 hw/virtio/virtio.c                           |   2 +-
 net/vhost-vdpa.c                             | 294 ++++++++++++++-
 hw/virtio/trace-events                       |  10 +-
 17 files changed, 1012 insertions(+), 130 deletions(-)

-- 
2.27.0

Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
Posted by Jason Wang 1 year, 10 months ago
在 2022/5/20 03:12, Eugenio Pérez 写道:
> Control virtqueue is used by networking device for accepting various
> commands from the driver. It's a must to support multiqueue and other
> configurations.
>
> Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> states, effectively intercepting them so qemu can track what regions of memory
> are dirty because device action and needs migration. However, this does not
> solve networking device state seen by the driver because CVQ messages, like
> changes on MAC addresses from the driver.
>
> To solve that, this series uses SVQ infraestructure proposed to intercept
> networking control messages used by the device. This way, qemu is able to
> update VirtIONet device model and to migrate it.
>
> However, to intercept all queues would slow device data forwarding. To solve
> that, only the CVQ must be intercepted all the time. This is achieved using
> the ASID infraestructure, that allows different translations for different
> virtqueues. The most updated kernel part of ASID is proposed at [1].
>
> You can run qemu in two modes after applying this series: only intercepting
> cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
>
> -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
>
> First three patches enable the update of the virtio-net device model for each
> CVQ message acknoledged by the device.
>
> Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> This allows simplyfing the memory mapping, instead of map all the guest's
> memory like in the data virtqueues.
>
> Patch 10 allows to inject control messages to the device. This allows to set
> state to the device both at QEMU startup and at live migration destination. In
> the future, this may also be used to emulate _F_ANNOUNCE.
>
> Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> because they are still not accepted in the kernel.
>
> Patches 12-16 enables the set of the features of the net device model to the
> vdpa device at device start.
>
> Last ones enables the sepparated ASID and SVQ.
>
> Comments are welcomed.


As discussed, I think we need to split this huge series into smaller ones:

1) shadow CVQ only, this makes rx-filter-event work
2) ASID support for CVQ

And for 1) we need consider whether or not it could be simplified.

Or do it in reverse order, since if we do 1) first, we may have security 
issues.

Thoughts?

Thanks


>
> TODO:
> * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
>    reason, blocking migration. This is tricky, since it can cause that the VM
>    cannot be migrated anymore, so some way of block it must be used.
> * Review failure paths, some are with TODO notes, other don't.
>
> Changes from rfc v7:
> * Don't map all guest space in ASID 1 but copy all the buffers. No need for
>    more memory listeners.
> * Move net backend start callback to SVQ.
> * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> * Changed ioctls, but they're provisional anyway.
> * Reorder commits so refactor and code adding ones are closer to usage.
> * Usual cleaning: better tracing, doc, patches messages, ...
>
> Changes from rfc v6:
> * Fix bad iotlb updates order when batching was enabled
> * Add reference counting to iova_tree so cleaning is simpler.
>
> Changes from rfc v5:
> * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
>
> Changes from rfc v4:
> * Add missing tracing
> * Add multiqueue support
> * Use already sent version for replacing g_memdup
> * Care with memory management
>
> Changes from rfc v3:
> * Fix bad returning of descriptors to SVQ list.
>
> Changes from rfc v2:
> * Fix use-after-free.
>
> Changes from rfc v1:
> * Rebase to latest master.
> * Configure ASID instead of assuming cvq asid != data vqs asid.
> * Update device model so (MAC) state can be migrated too.
>
> [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
>
> Eugenio Pérez (21):
>    virtio-net: Expose ctrl virtqueue logic
>    vhost: Add custom used buffer callback
>    vdpa: control virtqueue support on shadow virtqueue
>    virtio: Make virtqueue_alloc_element non-static
>    vhost: Add vhost_iova_tree_find
>    vdpa: Add map/unmap operation callback to SVQ
>    vhost: move descriptor translation to vhost_svq_vring_write_descs
>    vhost: Add SVQElement
>    vhost: Add svq copy desc mode
>    vhost: Add vhost_svq_inject
>    vhost: Update kernel headers
>    vdpa: delay set_vring_ready after DRIVER_OK
>    vhost: Add ShadowVirtQueueStart operation
>    vhost: Make possible to check for device exclusive vq group
>    vhost: add vhost_svq_poll
>    vdpa: Add vhost_vdpa_start_control_svq
>    vdpa: Add asid attribute to vdpa device
>    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
>    vhost: Add reference counting to vhost_iova_tree
>    vdpa: Add x-svq to NetdevVhostVDPAOptions
>    vdpa: Add x-cvq-svq
>
>   qapi/net.json                                |  13 +-
>   hw/virtio/vhost-iova-tree.h                  |   7 +-
>   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
>   include/hw/virtio/vhost-vdpa.h               |   3 +
>   include/hw/virtio/vhost.h                    |   3 +
>   include/hw/virtio/virtio-net.h               |   4 +
>   include/hw/virtio/virtio.h                   |   1 +
>   include/standard-headers/linux/vhost_types.h |  11 +-
>   linux-headers/linux/vhost.h                  |  25 +-
>   hw/net/vhost_net.c                           |   5 +-
>   hw/net/virtio-net.c                          |  84 +++--
>   hw/virtio/vhost-iova-tree.c                  |  35 +-
>   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
>   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
>   hw/virtio/virtio.c                           |   2 +-
>   net/vhost-vdpa.c                             | 294 ++++++++++++++-
>   hw/virtio/trace-events                       |  10 +-
>   17 files changed, 1012 insertions(+), 130 deletions(-)
>


Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
Posted by Eugenio Perez Martin 1 year, 10 months ago
On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > Control virtqueue is used by networking device for accepting various
> > commands from the driver. It's a must to support multiqueue and other
> > configurations.
> >
> > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > states, effectively intercepting them so qemu can track what regions of memory
> > are dirty because device action and needs migration. However, this does not
> > solve networking device state seen by the driver because CVQ messages, like
> > changes on MAC addresses from the driver.
> >
> > To solve that, this series uses SVQ infraestructure proposed to intercept
> > networking control messages used by the device. This way, qemu is able to
> > update VirtIONet device model and to migrate it.
> >
> > However, to intercept all queues would slow device data forwarding. To solve
> > that, only the CVQ must be intercepted all the time. This is achieved using
> > the ASID infraestructure, that allows different translations for different
> > virtqueues. The most updated kernel part of ASID is proposed at [1].
> >
> > You can run qemu in two modes after applying this series: only intercepting
> > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> >
> > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> >
> > First three patches enable the update of the virtio-net device model for each
> > CVQ message acknoledged by the device.
> >
> > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > This allows simplyfing the memory mapping, instead of map all the guest's
> > memory like in the data virtqueues.
> >
> > Patch 10 allows to inject control messages to the device. This allows to set
> > state to the device both at QEMU startup and at live migration destination. In
> > the future, this may also be used to emulate _F_ANNOUNCE.
> >
> > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > because they are still not accepted in the kernel.
> >
> > Patches 12-16 enables the set of the features of the net device model to the
> > vdpa device at device start.
> >
> > Last ones enables the sepparated ASID and SVQ.
> >
> > Comments are welcomed.
>
>
> As discussed, I think we need to split this huge series into smaller ones:
>
> 1) shadow CVQ only, this makes rx-filter-event work
> 2) ASID support for CVQ
>
> And for 1) we need consider whether or not it could be simplified.
>
> Or do it in reverse order, since if we do 1) first, we may have security
> issues.
>

I'm ok with both, but I also think 2) before 1) might make more sense.
There is no way to only shadow CVQ otherwise ATM.

Can we do as with previous base SVQ patches? they were merged although
there is still no way to enable SVQ.

Thanks!

> Thoughts?
>
> Thanks
>
>
> >
> > TODO:
> > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> >    reason, blocking migration. This is tricky, since it can cause that the VM
> >    cannot be migrated anymore, so some way of block it must be used.
> > * Review failure paths, some are with TODO notes, other don't.
> >
> > Changes from rfc v7:
> > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> >    more memory listeners.
> > * Move net backend start callback to SVQ.
> > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > * Changed ioctls, but they're provisional anyway.
> > * Reorder commits so refactor and code adding ones are closer to usage.
> > * Usual cleaning: better tracing, doc, patches messages, ...
> >
> > Changes from rfc v6:
> > * Fix bad iotlb updates order when batching was enabled
> > * Add reference counting to iova_tree so cleaning is simpler.
> >
> > Changes from rfc v5:
> > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> >
> > Changes from rfc v4:
> > * Add missing tracing
> > * Add multiqueue support
> > * Use already sent version for replacing g_memdup
> > * Care with memory management
> >
> > Changes from rfc v3:
> > * Fix bad returning of descriptors to SVQ list.
> >
> > Changes from rfc v2:
> > * Fix use-after-free.
> >
> > Changes from rfc v1:
> > * Rebase to latest master.
> > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > * Update device model so (MAC) state can be migrated too.
> >
> > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> >
> > Eugenio Pérez (21):
> >    virtio-net: Expose ctrl virtqueue logic
> >    vhost: Add custom used buffer callback
> >    vdpa: control virtqueue support on shadow virtqueue
> >    virtio: Make virtqueue_alloc_element non-static
> >    vhost: Add vhost_iova_tree_find
> >    vdpa: Add map/unmap operation callback to SVQ
> >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> >    vhost: Add SVQElement
> >    vhost: Add svq copy desc mode
> >    vhost: Add vhost_svq_inject
> >    vhost: Update kernel headers
> >    vdpa: delay set_vring_ready after DRIVER_OK
> >    vhost: Add ShadowVirtQueueStart operation
> >    vhost: Make possible to check for device exclusive vq group
> >    vhost: add vhost_svq_poll
> >    vdpa: Add vhost_vdpa_start_control_svq
> >    vdpa: Add asid attribute to vdpa device
> >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> >    vhost: Add reference counting to vhost_iova_tree
> >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> >    vdpa: Add x-cvq-svq
> >
> >   qapi/net.json                                |  13 +-
> >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> >   include/hw/virtio/vhost-vdpa.h               |   3 +
> >   include/hw/virtio/vhost.h                    |   3 +
> >   include/hw/virtio/virtio-net.h               |   4 +
> >   include/hw/virtio/virtio.h                   |   1 +
> >   include/standard-headers/linux/vhost_types.h |  11 +-
> >   linux-headers/linux/vhost.h                  |  25 +-
> >   hw/net/vhost_net.c                           |   5 +-
> >   hw/net/virtio-net.c                          |  84 +++--
> >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> >   hw/virtio/virtio.c                           |   2 +-
> >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> >   hw/virtio/trace-events                       |  10 +-
> >   17 files changed, 1012 insertions(+), 130 deletions(-)
> >
>
Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
Posted by Eugenio Perez Martin 1 year, 10 months ago
On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
>
> On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > Control virtqueue is used by networking device for accepting various
> > > commands from the driver. It's a must to support multiqueue and other
> > > configurations.
> > >
> > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > states, effectively intercepting them so qemu can track what regions of memory
> > > are dirty because device action and needs migration. However, this does not
> > > solve networking device state seen by the driver because CVQ messages, like
> > > changes on MAC addresses from the driver.
> > >
> > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > networking control messages used by the device. This way, qemu is able to
> > > update VirtIONet device model and to migrate it.
> > >
> > > However, to intercept all queues would slow device data forwarding. To solve
> > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > the ASID infraestructure, that allows different translations for different
> > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > >
> > > You can run qemu in two modes after applying this series: only intercepting
> > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > >
> > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > >
> > > First three patches enable the update of the virtio-net device model for each
> > > CVQ message acknoledged by the device.
> > >
> > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > memory like in the data virtqueues.
> > >
> > > Patch 10 allows to inject control messages to the device. This allows to set
> > > state to the device both at QEMU startup and at live migration destination. In
> > > the future, this may also be used to emulate _F_ANNOUNCE.
> > >
> > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > because they are still not accepted in the kernel.
> > >
> > > Patches 12-16 enables the set of the features of the net device model to the
> > > vdpa device at device start.
> > >
> > > Last ones enables the sepparated ASID and SVQ.
> > >
> > > Comments are welcomed.
> >
> >
> > As discussed, I think we need to split this huge series into smaller ones:
> >
> > 1) shadow CVQ only, this makes rx-filter-event work
> > 2) ASID support for CVQ
> >
> > And for 1) we need consider whether or not it could be simplified.
> >
> > Or do it in reverse order, since if we do 1) first, we may have security
> > issues.
> >
>
> I'm ok with both, but I also think 2) before 1) might make more sense.
> There is no way to only shadow CVQ otherwise ATM.
>

On second thought, that order is kind of harder.

If we only map CVQ buffers, we need to either:
a. Copy them to controlled buffers
b. Track properly when to unmap them

Alternative a. have the same problems exposed in this RFC: It's hard
(and unneeded in the final version) to know the size to copy.
Alternative b. also requires things not needed in the final version,
like to count the number of times each page is mapped and unmapped.

So I'll go to the first alternative, that is also the proposed order
of the RFC. What security issues do you expect beyond the comments in
this series?

Thanks!

> Can we do as with previous base SVQ patches? they were merged although
> there is still no way to enable SVQ.
>
> Thanks!
>
> > Thoughts?
> >
> > Thanks
> >
> >
> > >
> > > TODO:
> > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > >    cannot be migrated anymore, so some way of block it must be used.
> > > * Review failure paths, some are with TODO notes, other don't.
> > >
> > > Changes from rfc v7:
> > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > >    more memory listeners.
> > > * Move net backend start callback to SVQ.
> > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > * Changed ioctls, but they're provisional anyway.
> > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > * Usual cleaning: better tracing, doc, patches messages, ...
> > >
> > > Changes from rfc v6:
> > > * Fix bad iotlb updates order when batching was enabled
> > > * Add reference counting to iova_tree so cleaning is simpler.
> > >
> > > Changes from rfc v5:
> > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > >
> > > Changes from rfc v4:
> > > * Add missing tracing
> > > * Add multiqueue support
> > > * Use already sent version for replacing g_memdup
> > > * Care with memory management
> > >
> > > Changes from rfc v3:
> > > * Fix bad returning of descriptors to SVQ list.
> > >
> > > Changes from rfc v2:
> > > * Fix use-after-free.
> > >
> > > Changes from rfc v1:
> > > * Rebase to latest master.
> > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > * Update device model so (MAC) state can be migrated too.
> > >
> > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > >
> > > Eugenio Pérez (21):
> > >    virtio-net: Expose ctrl virtqueue logic
> > >    vhost: Add custom used buffer callback
> > >    vdpa: control virtqueue support on shadow virtqueue
> > >    virtio: Make virtqueue_alloc_element non-static
> > >    vhost: Add vhost_iova_tree_find
> > >    vdpa: Add map/unmap operation callback to SVQ
> > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > >    vhost: Add SVQElement
> > >    vhost: Add svq copy desc mode
> > >    vhost: Add vhost_svq_inject
> > >    vhost: Update kernel headers
> > >    vdpa: delay set_vring_ready after DRIVER_OK
> > >    vhost: Add ShadowVirtQueueStart operation
> > >    vhost: Make possible to check for device exclusive vq group
> > >    vhost: add vhost_svq_poll
> > >    vdpa: Add vhost_vdpa_start_control_svq
> > >    vdpa: Add asid attribute to vdpa device
> > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > >    vhost: Add reference counting to vhost_iova_tree
> > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > >    vdpa: Add x-cvq-svq
> > >
> > >   qapi/net.json                                |  13 +-
> > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > >   include/hw/virtio/vhost.h                    |   3 +
> > >   include/hw/virtio/virtio-net.h               |   4 +
> > >   include/hw/virtio/virtio.h                   |   1 +
> > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > >   linux-headers/linux/vhost.h                  |  25 +-
> > >   hw/net/vhost_net.c                           |   5 +-
> > >   hw/net/virtio-net.c                          |  84 +++--
> > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > >   hw/virtio/virtio.c                           |   2 +-
> > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > >   hw/virtio/trace-events                       |  10 +-
> > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > >
> >
Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
Posted by Jason Wang 1 year, 10 months ago
On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> >
> > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > >
> > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > Control virtqueue is used by networking device for accepting various
> > > > commands from the driver. It's a must to support multiqueue and other
> > > > configurations.
> > > >
> > > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > > states, effectively intercepting them so qemu can track what regions of memory
> > > > are dirty because device action and needs migration. However, this does not
> > > > solve networking device state seen by the driver because CVQ messages, like
> > > > changes on MAC addresses from the driver.
> > > >
> > > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > > networking control messages used by the device. This way, qemu is able to
> > > > update VirtIONet device model and to migrate it.
> > > >
> > > > However, to intercept all queues would slow device data forwarding. To solve
> > > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > > the ASID infraestructure, that allows different translations for different
> > > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > > >
> > > > You can run qemu in two modes after applying this series: only intercepting
> > > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > > >
> > > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > > >
> > > > First three patches enable the update of the virtio-net device model for each
> > > > CVQ message acknoledged by the device.
> > > >
> > > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > > memory like in the data virtqueues.
> > > >
> > > > Patch 10 allows to inject control messages to the device. This allows to set
> > > > state to the device both at QEMU startup and at live migration destination. In
> > > > the future, this may also be used to emulate _F_ANNOUNCE.
> > > >
> > > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > > because they are still not accepted in the kernel.
> > > >
> > > > Patches 12-16 enables the set of the features of the net device model to the
> > > > vdpa device at device start.
> > > >
> > > > Last ones enables the sepparated ASID and SVQ.
> > > >
> > > > Comments are welcomed.
> > >
> > >
> > > As discussed, I think we need to split this huge series into smaller ones:
> > >
> > > 1) shadow CVQ only, this makes rx-filter-event work
> > > 2) ASID support for CVQ
> > >
> > > And for 1) we need consider whether or not it could be simplified.
> > >
> > > Or do it in reverse order, since if we do 1) first, we may have security
> > > issues.
> > >
> >
> > I'm ok with both, but I also think 2) before 1) might make more sense.
> > There is no way to only shadow CVQ otherwise ATM.
> >
>
> On second thought, that order is kind of harder.
>
> If we only map CVQ buffers, we need to either:
> a. Copy them to controlled buffers
> b. Track properly when to unmap them

Just to make sure we're at the same page:

I meant we can start with e.g having a dedicated ASID for CVQ but
still using CVQ passthrough.

Then do other stuff on top.

>
> Alternative a. have the same problems exposed in this RFC: It's hard
> (and unneeded in the final version) to know the size to copy.
> Alternative b. also requires things not needed in the final version,
> like to count the number of times each page is mapped and unmapped.
>
> So I'll go to the first alternative, that is also the proposed order
> of the RFC. What security issues do you expect beyond the comments in
> this series?

If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ and
try to peek/modify it?

Thanks

>
> Thanks!
>
> > Can we do as with previous base SVQ patches? they were merged although
> > there is still no way to enable SVQ.
> >
> > Thanks!
> >
> > > Thoughts?
> > >
> > > Thanks
> > >
> > >
> > > >
> > > > TODO:
> > > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > > >    cannot be migrated anymore, so some way of block it must be used.
> > > > * Review failure paths, some are with TODO notes, other don't.
> > > >
> > > > Changes from rfc v7:
> > > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > > >    more memory listeners.
> > > > * Move net backend start callback to SVQ.
> > > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > > * Changed ioctls, but they're provisional anyway.
> > > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > > * Usual cleaning: better tracing, doc, patches messages, ...
> > > >
> > > > Changes from rfc v6:
> > > > * Fix bad iotlb updates order when batching was enabled
> > > > * Add reference counting to iova_tree so cleaning is simpler.
> > > >
> > > > Changes from rfc v5:
> > > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > > >
> > > > Changes from rfc v4:
> > > > * Add missing tracing
> > > > * Add multiqueue support
> > > > * Use already sent version for replacing g_memdup
> > > > * Care with memory management
> > > >
> > > > Changes from rfc v3:
> > > > * Fix bad returning of descriptors to SVQ list.
> > > >
> > > > Changes from rfc v2:
> > > > * Fix use-after-free.
> > > >
> > > > Changes from rfc v1:
> > > > * Rebase to latest master.
> > > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > > * Update device model so (MAC) state can be migrated too.
> > > >
> > > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > > >
> > > > Eugenio Pérez (21):
> > > >    virtio-net: Expose ctrl virtqueue logic
> > > >    vhost: Add custom used buffer callback
> > > >    vdpa: control virtqueue support on shadow virtqueue
> > > >    virtio: Make virtqueue_alloc_element non-static
> > > >    vhost: Add vhost_iova_tree_find
> > > >    vdpa: Add map/unmap operation callback to SVQ
> > > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > > >    vhost: Add SVQElement
> > > >    vhost: Add svq copy desc mode
> > > >    vhost: Add vhost_svq_inject
> > > >    vhost: Update kernel headers
> > > >    vdpa: delay set_vring_ready after DRIVER_OK
> > > >    vhost: Add ShadowVirtQueueStart operation
> > > >    vhost: Make possible to check for device exclusive vq group
> > > >    vhost: add vhost_svq_poll
> > > >    vdpa: Add vhost_vdpa_start_control_svq
> > > >    vdpa: Add asid attribute to vdpa device
> > > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > > >    vhost: Add reference counting to vhost_iova_tree
> > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > >    vdpa: Add x-cvq-svq
> > > >
> > > >   qapi/net.json                                |  13 +-
> > > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > > >   include/hw/virtio/vhost.h                    |   3 +
> > > >   include/hw/virtio/virtio-net.h               |   4 +
> > > >   include/hw/virtio/virtio.h                   |   1 +
> > > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > > >   linux-headers/linux/vhost.h                  |  25 +-
> > > >   hw/net/vhost_net.c                           |   5 +-
> > > >   hw/net/virtio-net.c                          |  84 +++--
> > > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > > >   hw/virtio/virtio.c                           |   2 +-
> > > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > > >   hw/virtio/trace-events                       |  10 +-
> > > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > > >
> > >
>
Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
Posted by Eugenio Perez Martin 1 year, 10 months ago
On Tue, Jun 14, 2022 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > >
> > > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > >
> > > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > > Control virtqueue is used by networking device for accepting various
> > > > > commands from the driver. It's a must to support multiqueue and other
> > > > > configurations.
> > > > >
> > > > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > > > states, effectively intercepting them so qemu can track what regions of memory
> > > > > are dirty because device action and needs migration. However, this does not
> > > > > solve networking device state seen by the driver because CVQ messages, like
> > > > > changes on MAC addresses from the driver.
> > > > >
> > > > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > > > networking control messages used by the device. This way, qemu is able to
> > > > > update VirtIONet device model and to migrate it.
> > > > >
> > > > > However, to intercept all queues would slow device data forwarding. To solve
> > > > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > > > the ASID infraestructure, that allows different translations for different
> > > > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > > > >
> > > > > You can run qemu in two modes after applying this series: only intercepting
> > > > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > > > >
> > > > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > > > >
> > > > > First three patches enable the update of the virtio-net device model for each
> > > > > CVQ message acknoledged by the device.
> > > > >
> > > > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > > > memory like in the data virtqueues.
> > > > >
> > > > > Patch 10 allows to inject control messages to the device. This allows to set
> > > > > state to the device both at QEMU startup and at live migration destination. In
> > > > > the future, this may also be used to emulate _F_ANNOUNCE.
> > > > >
> > > > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > > > because they are still not accepted in the kernel.
> > > > >
> > > > > Patches 12-16 enables the set of the features of the net device model to the
> > > > > vdpa device at device start.
> > > > >
> > > > > Last ones enables the sepparated ASID and SVQ.
> > > > >
> > > > > Comments are welcomed.
> > > >
> > > >
> > > > As discussed, I think we need to split this huge series into smaller ones:
> > > >
> > > > 1) shadow CVQ only, this makes rx-filter-event work
> > > > 2) ASID support for CVQ
> > > >
> > > > And for 1) we need consider whether or not it could be simplified.
> > > >
> > > > Or do it in reverse order, since if we do 1) first, we may have security
> > > > issues.
> > > >
> > >
> > > I'm ok with both, but I also think 2) before 1) might make more sense.
> > > There is no way to only shadow CVQ otherwise ATM.
> > >
> >
> > On second thought, that order is kind of harder.
> >
> > If we only map CVQ buffers, we need to either:
> > a. Copy them to controlled buffers
> > b. Track properly when to unmap them
>
> Just to make sure we're at the same page:
>
> I meant we can start with e.g having a dedicated ASID for CVQ but
> still using CVQ passthrough.
>

That would imply duplicating all the memory listener updates to both
ASIDs. That part of the code needs to be reverted. I'm ok with that,
but I'm not sure if it's worth it to do it that way.

> Then do other stuff on top.
>
> >
> > Alternative a. have the same problems exposed in this RFC: It's hard
> > (and unneeded in the final version) to know the size to copy.
> > Alternative b. also requires things not needed in the final version,
> > like to count the number of times each page is mapped and unmapped.
> >
> > So I'll go to the first alternative, that is also the proposed order
> > of the RFC. What security issues do you expect beyond the comments in
> > this series?
>
> If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ and
> try to peek/modify it?
>

It works the same way as data vqs, we're just updating the device
model in the middle. It should imply the exact same risk as updating
an emulated NIC control plane (including vhost-kernel / vhost-user).

Roughly speaking, it's just to propose patches 01 to 03, with your
comments. That already meets use cases like rx filter notifications
for devices with only one ASID.

Thanks!

> Thanks
>
> >
> > Thanks!
> >
> > > Can we do as with previous base SVQ patches? they were merged although
> > > there is still no way to enable SVQ.
> > >
> > > Thanks!
> > >
> > > > Thoughts?
> > > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > > TODO:
> > > > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > > > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > > > >    cannot be migrated anymore, so some way of block it must be used.
> > > > > * Review failure paths, some are with TODO notes, other don't.
> > > > >
> > > > > Changes from rfc v7:
> > > > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > > > >    more memory listeners.
> > > > > * Move net backend start callback to SVQ.
> > > > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > > > * Changed ioctls, but they're provisional anyway.
> > > > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > > > * Usual cleaning: better tracing, doc, patches messages, ...
> > > > >
> > > > > Changes from rfc v6:
> > > > > * Fix bad iotlb updates order when batching was enabled
> > > > > * Add reference counting to iova_tree so cleaning is simpler.
> > > > >
> > > > > Changes from rfc v5:
> > > > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > > > >
> > > > > Changes from rfc v4:
> > > > > * Add missing tracing
> > > > > * Add multiqueue support
> > > > > * Use already sent version for replacing g_memdup
> > > > > * Care with memory management
> > > > >
> > > > > Changes from rfc v3:
> > > > > * Fix bad returning of descriptors to SVQ list.
> > > > >
> > > > > Changes from rfc v2:
> > > > > * Fix use-after-free.
> > > > >
> > > > > Changes from rfc v1:
> > > > > * Rebase to latest master.
> > > > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > > > * Update device model so (MAC) state can be migrated too.
> > > > >
> > > > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > > > >
> > > > > Eugenio Pérez (21):
> > > > >    virtio-net: Expose ctrl virtqueue logic
> > > > >    vhost: Add custom used buffer callback
> > > > >    vdpa: control virtqueue support on shadow virtqueue
> > > > >    virtio: Make virtqueue_alloc_element non-static
> > > > >    vhost: Add vhost_iova_tree_find
> > > > >    vdpa: Add map/unmap operation callback to SVQ
> > > > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > > > >    vhost: Add SVQElement
> > > > >    vhost: Add svq copy desc mode
> > > > >    vhost: Add vhost_svq_inject
> > > > >    vhost: Update kernel headers
> > > > >    vdpa: delay set_vring_ready after DRIVER_OK
> > > > >    vhost: Add ShadowVirtQueueStart operation
> > > > >    vhost: Make possible to check for device exclusive vq group
> > > > >    vhost: add vhost_svq_poll
> > > > >    vdpa: Add vhost_vdpa_start_control_svq
> > > > >    vdpa: Add asid attribute to vdpa device
> > > > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > > > >    vhost: Add reference counting to vhost_iova_tree
> > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > >    vdpa: Add x-cvq-svq
> > > > >
> > > > >   qapi/net.json                                |  13 +-
> > > > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > > > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > > > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > > > >   include/hw/virtio/vhost.h                    |   3 +
> > > > >   include/hw/virtio/virtio-net.h               |   4 +
> > > > >   include/hw/virtio/virtio.h                   |   1 +
> > > > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > > > >   linux-headers/linux/vhost.h                  |  25 +-
> > > > >   hw/net/vhost_net.c                           |   5 +-
> > > > >   hw/net/virtio-net.c                          |  84 +++--
> > > > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > > > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > > > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > > > >   hw/virtio/virtio.c                           |   2 +-
> > > > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > > > >   hw/virtio/trace-events                       |  10 +-
> > > > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > > > >
> > > >
> >
>
Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
Posted by Jason Wang 1 year, 10 months ago
On Tue, Jun 14, 2022 at 4:14 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Jun 14, 2022 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > >
> > > > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > >
> > > > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > > > Control virtqueue is used by networking device for accepting various
> > > > > > commands from the driver. It's a must to support multiqueue and other
> > > > > > configurations.
> > > > > >
> > > > > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > > > > states, effectively intercepting them so qemu can track what regions of memory
> > > > > > are dirty because device action and needs migration. However, this does not
> > > > > > solve networking device state seen by the driver because CVQ messages, like
> > > > > > changes on MAC addresses from the driver.
> > > > > >
> > > > > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > > > > networking control messages used by the device. This way, qemu is able to
> > > > > > update VirtIONet device model and to migrate it.
> > > > > >
> > > > > > However, to intercept all queues would slow device data forwarding. To solve
> > > > > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > > > > the ASID infraestructure, that allows different translations for different
> > > > > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > > > > >
> > > > > > You can run qemu in two modes after applying this series: only intercepting
> > > > > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > > > > >
> > > > > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > > > > >
> > > > > > First three patches enable the update of the virtio-net device model for each
> > > > > > CVQ message acknoledged by the device.
> > > > > >
> > > > > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > > > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > > > > memory like in the data virtqueues.
> > > > > >
> > > > > > Patch 10 allows to inject control messages to the device. This allows to set
> > > > > > state to the device both at QEMU startup and at live migration destination. In
> > > > > > the future, this may also be used to emulate _F_ANNOUNCE.
> > > > > >
> > > > > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > > > > because they are still not accepted in the kernel.
> > > > > >
> > > > > > Patches 12-16 enables the set of the features of the net device model to the
> > > > > > vdpa device at device start.
> > > > > >
> > > > > > Last ones enables the sepparated ASID and SVQ.
> > > > > >
> > > > > > Comments are welcomed.
> > > > >
> > > > >
> > > > > As discussed, I think we need to split this huge series into smaller ones:
> > > > >
> > > > > 1) shadow CVQ only, this makes rx-filter-event work
> > > > > 2) ASID support for CVQ
> > > > >
> > > > > And for 1) we need consider whether or not it could be simplified.
> > > > >
> > > > > Or do it in reverse order, since if we do 1) first, we may have security
> > > > > issues.
> > > > >
> > > >
> > > > I'm ok with both, but I also think 2) before 1) might make more sense.
> > > > There is no way to only shadow CVQ otherwise ATM.
> > > >
> > >
> > > On second thought, that order is kind of harder.
> > >
> > > If we only map CVQ buffers, we need to either:
> > > a. Copy them to controlled buffers
> > > b. Track properly when to unmap them
> >
> > Just to make sure we're at the same page:
> >
> > I meant we can start with e.g having a dedicated ASID for CVQ but
> > still using CVQ passthrough.
> >
>
> That would imply duplicating all the memory listener updates to both
> ASIDs. That part of the code needs to be reverted. I'm ok with that,
> but I'm not sure if it's worth it to do it that way.

I don't get why it is related to memory listeners. The only change is

1) read the groups
2) set cvq to be an independent asid
3) update CVQ's IOTLB with its own ASID

?

>
> > Then do other stuff on top.
> >
> > >
> > > Alternative a. have the same problems exposed in this RFC: It's hard
> > > (and unneeded in the final version) to know the size to copy.
> > > Alternative b. also requires things not needed in the final version,
> > > like to count the number of times each page is mapped and unmapped.
> > >
> > > So I'll go to the first alternative, that is also the proposed order
> > > of the RFC. What security issues do you expect beyond the comments in
> > > this series?
> >
> > If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ and
> > try to peek/modify it?
> >
>
> It works the same way as data vqs, we're just updating the device
> model in the middle. It should imply the exact same risk as updating
> an emulated NIC control plane (including vhost-kernel / vhost-user).

Not sure I got you here. For vhost-kernel and vhost-user, CVQ's buffer
is owned by guests.

But if we shadow CVQ without ASID, the CVQ buffer is owned by QEMU and
there's no way to prevent guests from accessing it?

If in the case of vhost-kernel/vhost-user, there's a way for the guest
to exploit buffers owned by Qemu, it should be a bug.

Thanks

>
> Roughly speaking, it's just to propose patches 01 to 03, with your
> comments. That already meets use cases like rx filter notifications
> for devices with only one ASID.
>
> Thanks!
>
> > Thanks
> >
> > >
> > > Thanks!
> > >
> > > > Can we do as with previous base SVQ patches? they were merged although
> > > > there is still no way to enable SVQ.
> > > >
> > > > Thanks!
> > > >
> > > > > Thoughts?
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > >
> > > > > > TODO:
> > > > > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > > > > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > > > > >    cannot be migrated anymore, so some way of block it must be used.
> > > > > > * Review failure paths, some are with TODO notes, other don't.
> > > > > >
> > > > > > Changes from rfc v7:
> > > > > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > > > > >    more memory listeners.
> > > > > > * Move net backend start callback to SVQ.
> > > > > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > > > > * Changed ioctls, but they're provisional anyway.
> > > > > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > > > > * Usual cleaning: better tracing, doc, patches messages, ...
> > > > > >
> > > > > > Changes from rfc v6:
> > > > > > * Fix bad iotlb updates order when batching was enabled
> > > > > > * Add reference counting to iova_tree so cleaning is simpler.
> > > > > >
> > > > > > Changes from rfc v5:
> > > > > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > > > > >
> > > > > > Changes from rfc v4:
> > > > > > * Add missing tracing
> > > > > > * Add multiqueue support
> > > > > > * Use already sent version for replacing g_memdup
> > > > > > * Care with memory management
> > > > > >
> > > > > > Changes from rfc v3:
> > > > > > * Fix bad returning of descriptors to SVQ list.
> > > > > >
> > > > > > Changes from rfc v2:
> > > > > > * Fix use-after-free.
> > > > > >
> > > > > > Changes from rfc v1:
> > > > > > * Rebase to latest master.
> > > > > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > > > > * Update device model so (MAC) state can be migrated too.
> > > > > >
> > > > > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > > > > >
> > > > > > Eugenio Pérez (21):
> > > > > >    virtio-net: Expose ctrl virtqueue logic
> > > > > >    vhost: Add custom used buffer callback
> > > > > >    vdpa: control virtqueue support on shadow virtqueue
> > > > > >    virtio: Make virtqueue_alloc_element non-static
> > > > > >    vhost: Add vhost_iova_tree_find
> > > > > >    vdpa: Add map/unmap operation callback to SVQ
> > > > > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > > > > >    vhost: Add SVQElement
> > > > > >    vhost: Add svq copy desc mode
> > > > > >    vhost: Add vhost_svq_inject
> > > > > >    vhost: Update kernel headers
> > > > > >    vdpa: delay set_vring_ready after DRIVER_OK
> > > > > >    vhost: Add ShadowVirtQueueStart operation
> > > > > >    vhost: Make possible to check for device exclusive vq group
> > > > > >    vhost: add vhost_svq_poll
> > > > > >    vdpa: Add vhost_vdpa_start_control_svq
> > > > > >    vdpa: Add asid attribute to vdpa device
> > > > > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > > > > >    vhost: Add reference counting to vhost_iova_tree
> > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > >    vdpa: Add x-cvq-svq
> > > > > >
> > > > > >   qapi/net.json                                |  13 +-
> > > > > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > > > > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > > > > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > > > > >   include/hw/virtio/vhost.h                    |   3 +
> > > > > >   include/hw/virtio/virtio-net.h               |   4 +
> > > > > >   include/hw/virtio/virtio.h                   |   1 +
> > > > > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > > > > >   linux-headers/linux/vhost.h                  |  25 +-
> > > > > >   hw/net/vhost_net.c                           |   5 +-
> > > > > >   hw/net/virtio-net.c                          |  84 +++--
> > > > > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > > > > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > > > > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > > > > >   hw/virtio/virtio.c                           |   2 +-
> > > > > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > > > > >   hw/virtio/trace-events                       |  10 +-
> > > > > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > > > > >
> > > > >
> > >
> >
>
Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
Posted by Eugenio Perez Martin 1 year, 10 months ago
On Tue, Jun 14, 2022 at 10:20 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Jun 14, 2022 at 4:14 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Tue, Jun 14, 2022 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin
> > > <eperezma@redhat.com> wrote:
> > > >
> > > > On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > >
> > > > > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > >
> > > > > >
> > > > > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > > > > Control virtqueue is used by networking device for accepting various
> > > > > > > commands from the driver. It's a must to support multiqueue and other
> > > > > > > configurations.
> > > > > > >
> > > > > > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > > > > > states, effectively intercepting them so qemu can track what regions of memory
> > > > > > > are dirty because device action and needs migration. However, this does not
> > > > > > > solve networking device state seen by the driver because CVQ messages, like
> > > > > > > changes on MAC addresses from the driver.
> > > > > > >
> > > > > > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > > > > > networking control messages used by the device. This way, qemu is able to
> > > > > > > update VirtIONet device model and to migrate it.
> > > > > > >
> > > > > > > However, to intercept all queues would slow device data forwarding. To solve
> > > > > > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > > > > > the ASID infraestructure, that allows different translations for different
> > > > > > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > > > > > >
> > > > > > > You can run qemu in two modes after applying this series: only intercepting
> > > > > > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > > > > > >
> > > > > > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > > > > > >
> > > > > > > First three patches enable the update of the virtio-net device model for each
> > > > > > > CVQ message acknoledged by the device.
> > > > > > >
> > > > > > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > > > > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > > > > > memory like in the data virtqueues.
> > > > > > >
> > > > > > > Patch 10 allows to inject control messages to the device. This allows to set
> > > > > > > state to the device both at QEMU startup and at live migration destination. In
> > > > > > > the future, this may also be used to emulate _F_ANNOUNCE.
> > > > > > >
> > > > > > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > > > > > because they are still not accepted in the kernel.
> > > > > > >
> > > > > > > Patches 12-16 enables the set of the features of the net device model to the
> > > > > > > vdpa device at device start.
> > > > > > >
> > > > > > > Last ones enables the sepparated ASID and SVQ.
> > > > > > >
> > > > > > > Comments are welcomed.
> > > > > >
> > > > > >
> > > > > > As discussed, I think we need to split this huge series into smaller ones:
> > > > > >
> > > > > > 1) shadow CVQ only, this makes rx-filter-event work
> > > > > > 2) ASID support for CVQ
> > > > > >
> > > > > > And for 1) we need consider whether or not it could be simplified.
> > > > > >
> > > > > > Or do it in reverse order, since if we do 1) first, we may have security
> > > > > > issues.
> > > > > >
> > > > >
> > > > > I'm ok with both, but I also think 2) before 1) might make more sense.
> > > > > There is no way to only shadow CVQ otherwise ATM.
> > > > >
> > > >
> > > > On second thought, that order is kind of harder.
> > > >
> > > > If we only map CVQ buffers, we need to either:
> > > > a. Copy them to controlled buffers
> > > > b. Track properly when to unmap them
> > >
> > > Just to make sure we're at the same page:
> > >
> > > I meant we can start with e.g having a dedicated ASID for CVQ but
> > > still using CVQ passthrough.
> > >
> >
> > That would imply duplicating all the memory listener updates to both
> > ASIDs. That part of the code needs to be reverted. I'm ok with that,
> > but I'm not sure if it's worth it to do it that way.
>
> I don't get why it is related to memory listeners. The only change is
>
> 1) read the groups
> 2) set cvq to be an independent asid
> 3) update CVQ's IOTLB with its own ASID
>

How to track the mappings of step 3) without a copy?

If we don't copy the buffers to qemu's IOVA, we need to track when to
unmap CVQ buffers memory. Many CVQ buffers could be in the same page,
so we need to refcount them (or similar solution).

This series copies the buffers to an independent buffer in qemu memory
to avoid that. Once you copy them, we have the problem you point at
some patch later: The guest control buffers, so qemu must understand
CVQ so the guest cannot trick it. All of this is orthogonal to ASID.
At that point, we have this series except for the asid part and the
injection of CVQ buffers at the LM destination, isn't it?

CVQ buffers can be copied in the qemu IOVA space and be offered to the
device. Much like SVQ vrings, the copied buffers will not be
accessible from the guest. The hw device (as "non emulated cvq") will
receive a lot of dma updates, but it's temporary. We can add ASID on
top of that as a mean to:
- Not to SVQ data plane (fundamental to the intended use case of vdpa).
- Not to pollute data plane DMA mappings.

> ?
>
> >
> > > Then do other stuff on top.
> > >
> > > >
> > > > Alternative a. have the same problems exposed in this RFC: It's hard
> > > > (and unneeded in the final version) to know the size to copy.
> > > > Alternative b. also requires things not needed in the final version,
> > > > like to count the number of times each page is mapped and unmapped.
> > > >
> > > > So I'll go to the first alternative, that is also the proposed order
> > > > of the RFC. What security issues do you expect beyond the comments in
> > > > this series?
> > >
> > > If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ and
> > > try to peek/modify it?
> > >
> >
> > It works the same way as data vqs, we're just updating the device
> > model in the middle. It should imply the exact same risk as updating
> > an emulated NIC control plane (including vhost-kernel / vhost-user).
>
> Not sure I got you here. For vhost-kernel and vhost-user, CVQ's buffer
> is owned by guests.
>

The same way they control the data plane when all data virtqueues are
shadowed for dirty page tracking (more on the risk of qemu updating
the device model below).

> But if we shadow CVQ without ASID, the CVQ buffer is owned by QEMU and
> there's no way to prevent guests from accessing it?
>

With SVQ the memory exposed to the device is already shadowed. They
cannot access the CVQ buffers memory the same way they cannot access
the SVQ vrings.

> If in the case of vhost-kernel/vhost-user, there's a way for the guest
> to exploit buffers owned by Qemu, it should be a bug.
>

The only extra step is the call to virtio_net_handle_ctrl_iov
(extracted from virtio_net_handle_ctrl). If a guest can exploit that
in SVQ mode, it can exploit it too with other vhost backends as far as
I see.

> Thanks
>
> >
> > Roughly speaking, it's just to propose patches 01 to 03, with your
> > comments. That already meets use cases like rx filter notifications
> > for devices with only one ASID.
> >

This part of my mail is not correct, we need to add a few patches of
this series on top :). If not, it would be exploitable.

Thanks!

> > Thanks!
> >
> > > Thanks
> > >
> > > >
> > > > Thanks!
> > > >
> > > > > Can we do as with previous base SVQ patches? they were merged although
> > > > > there is still no way to enable SVQ.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > TODO:
> > > > > > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > > > > > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > > > > > >    cannot be migrated anymore, so some way of block it must be used.
> > > > > > > * Review failure paths, some are with TODO notes, other don't.
> > > > > > >
> > > > > > > Changes from rfc v7:
> > > > > > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > > > > > >    more memory listeners.
> > > > > > > * Move net backend start callback to SVQ.
> > > > > > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > > > > > * Changed ioctls, but they're provisional anyway.
> > > > > > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > > > > > * Usual cleaning: better tracing, doc, patches messages, ...
> > > > > > >
> > > > > > > Changes from rfc v6:
> > > > > > > * Fix bad iotlb updates order when batching was enabled
> > > > > > > * Add reference counting to iova_tree so cleaning is simpler.
> > > > > > >
> > > > > > > Changes from rfc v5:
> > > > > > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > > > > > >
> > > > > > > Changes from rfc v4:
> > > > > > > * Add missing tracing
> > > > > > > * Add multiqueue support
> > > > > > > * Use already sent version for replacing g_memdup
> > > > > > > * Care with memory management
> > > > > > >
> > > > > > > Changes from rfc v3:
> > > > > > > * Fix bad returning of descriptors to SVQ list.
> > > > > > >
> > > > > > > Changes from rfc v2:
> > > > > > > * Fix use-after-free.
> > > > > > >
> > > > > > > Changes from rfc v1:
> > > > > > > * Rebase to latest master.
> > > > > > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > > > > > * Update device model so (MAC) state can be migrated too.
> > > > > > >
> > > > > > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > > > > > >
> > > > > > > Eugenio Pérez (21):
> > > > > > >    virtio-net: Expose ctrl virtqueue logic
> > > > > > >    vhost: Add custom used buffer callback
> > > > > > >    vdpa: control virtqueue support on shadow virtqueue
> > > > > > >    virtio: Make virtqueue_alloc_element non-static
> > > > > > >    vhost: Add vhost_iova_tree_find
> > > > > > >    vdpa: Add map/unmap operation callback to SVQ
> > > > > > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > > > > > >    vhost: Add SVQElement
> > > > > > >    vhost: Add svq copy desc mode
> > > > > > >    vhost: Add vhost_svq_inject
> > > > > > >    vhost: Update kernel headers
> > > > > > >    vdpa: delay set_vring_ready after DRIVER_OK
> > > > > > >    vhost: Add ShadowVirtQueueStart operation
> > > > > > >    vhost: Make possible to check for device exclusive vq group
> > > > > > >    vhost: add vhost_svq_poll
> > > > > > >    vdpa: Add vhost_vdpa_start_control_svq
> > > > > > >    vdpa: Add asid attribute to vdpa device
> > > > > > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > > > > > >    vhost: Add reference counting to vhost_iova_tree
> > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > >    vdpa: Add x-cvq-svq
> > > > > > >
> > > > > > >   qapi/net.json                                |  13 +-
> > > > > > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > > > > > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > > > > > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > > > > > >   include/hw/virtio/vhost.h                    |   3 +
> > > > > > >   include/hw/virtio/virtio-net.h               |   4 +
> > > > > > >   include/hw/virtio/virtio.h                   |   1 +
> > > > > > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > > > > > >   linux-headers/linux/vhost.h                  |  25 +-
> > > > > > >   hw/net/vhost_net.c                           |   5 +-
> > > > > > >   hw/net/virtio-net.c                          |  84 +++--
> > > > > > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > > > > > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > > > > > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > > > > > >   hw/virtio/virtio.c                           |   2 +-
> > > > > > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > > > > > >   hw/virtio/trace-events                       |  10 +-
> > > > > > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > > > > > >
> > > > > >
> > > >
> > >
> >
>
Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
Posted by Jason Wang 1 year, 10 months ago
On Tue, Jun 14, 2022 at 5:32 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Tue, Jun 14, 2022 at 10:20 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Jun 14, 2022 at 4:14 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Tue, Jun 14, 2022 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin
> > > > <eperezma@redhat.com> wrote:
> > > > >
> > > > > On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > > > > > Control virtqueue is used by networking device for accepting various
> > > > > > > > commands from the driver. It's a must to support multiqueue and other
> > > > > > > > configurations.
> > > > > > > >
> > > > > > > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > > > > > > states, effectively intercepting them so qemu can track what regions of memory
> > > > > > > > are dirty because device action and needs migration. However, this does not
> > > > > > > > solve networking device state seen by the driver because CVQ messages, like
> > > > > > > > changes on MAC addresses from the driver.
> > > > > > > >
> > > > > > > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > > > > > > networking control messages used by the device. This way, qemu is able to
> > > > > > > > update VirtIONet device model and to migrate it.
> > > > > > > >
> > > > > > > > However, to intercept all queues would slow device data forwarding. To solve
> > > > > > > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > > > > > > the ASID infraestructure, that allows different translations for different
> > > > > > > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > > > > > > >
> > > > > > > > You can run qemu in two modes after applying this series: only intercepting
> > > > > > > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > > > > > > >
> > > > > > > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > > > > > > >
> > > > > > > > First three patches enable the update of the virtio-net device model for each
> > > > > > > > CVQ message acknoledged by the device.
> > > > > > > >
> > > > > > > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > > > > > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > > > > > > memory like in the data virtqueues.
> > > > > > > >
> > > > > > > > Patch 10 allows to inject control messages to the device. This allows to set
> > > > > > > > state to the device both at QEMU startup and at live migration destination. In
> > > > > > > > the future, this may also be used to emulate _F_ANNOUNCE.
> > > > > > > >
> > > > > > > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > > > > > > because they are still not accepted in the kernel.
> > > > > > > >
> > > > > > > > Patches 12-16 enables the set of the features of the net device model to the
> > > > > > > > vdpa device at device start.
> > > > > > > >
> > > > > > > > Last ones enables the sepparated ASID and SVQ.
> > > > > > > >
> > > > > > > > Comments are welcomed.
> > > > > > >
> > > > > > >
> > > > > > > As discussed, I think we need to split this huge series into smaller ones:
> > > > > > >
> > > > > > > 1) shadow CVQ only, this makes rx-filter-event work
> > > > > > > 2) ASID support for CVQ
> > > > > > >
> > > > > > > And for 1) we need consider whether or not it could be simplified.
> > > > > > >
> > > > > > > Or do it in reverse order, since if we do 1) first, we may have security
> > > > > > > issues.
> > > > > > >
> > > > > >
> > > > > > I'm ok with both, but I also think 2) before 1) might make more sense.
> > > > > > There is no way to only shadow CVQ otherwise ATM.
> > > > > >
> > > > >
> > > > > On second thought, that order is kind of harder.
> > > > >
> > > > > If we only map CVQ buffers, we need to either:
> > > > > a. Copy them to controlled buffers
> > > > > b. Track properly when to unmap them
> > > >
> > > > Just to make sure we're at the same page:
> > > >
> > > > I meant we can start with e.g having a dedicated ASID for CVQ but
> > > > still using CVQ passthrough.
> > > >
> > >
> > > That would imply duplicating all the memory listener updates to both
> > > ASIDs. That part of the code needs to be reverted. I'm ok with that,
> > > but I'm not sure if it's worth it to do it that way.
> >
> > I don't get why it is related to memory listeners. The only change is
> >
> > 1) read the groups
> > 2) set cvq to be an independent asid
> > 3) update CVQ's IOTLB with its own ASID
> >
>
> How to track the mappings of step 3) without a copy?

So let me try to explain, what I propose is to split the patches. So
the above could be the first part. Since we know:

1) CVQ is passthrough to guest right now
2) We know CVQ will use an independent ASID

It doesn't harm to implement those first. It's unrelated to the policy
(e.g how to shadow CVQ).

>
> If we don't copy the buffers to qemu's IOVA, we need to track when to
> unmap CVQ buffers memory. Many CVQ buffers could be in the same page,
> so we need to refcount them (or similar solution).

Can we use fixed mapping instead of the dynamic ones?

>
> This series copies the buffers to an independent buffer in qemu memory
> to avoid that. Once you copy them, we have the problem you point at
> some patch later: The guest control buffers, so qemu must understand
> CVQ so the guest cannot trick it. All of this is orthogonal to ASID.
> At that point, we have this series except for the asid part and the
> injection of CVQ buffers at the LM destination, isn't it?

So we have several stuffs:

1) ASID support
2) Shadow CVQ only
3) State restoring

I hope we can split them into independent series. If we want to shadow
CVQ first, we need to prove that it is safe without ASID.

>
> CVQ buffers can be copied in the qemu IOVA space and be offered to the
> device. Much like SVQ vrings, the copied buffers will not be
> accessible from the guest. The hw device (as "non emulated cvq") will
> receive a lot of dma updates, but it's temporary. We can add ASID on
> top of that as a mean to:
> - Not to SVQ data plane (fundamental to the intended use case of vdpa).
> - Not to pollute data plane DMA mappings.
>
> > ?
> >
> > >
> > > > Then do other stuff on top.
> > > >
> > > > >
> > > > > Alternative a. have the same problems exposed in this RFC: It's hard
> > > > > (and unneeded in the final version) to know the size to copy.
> > > > > Alternative b. also requires things not needed in the final version,
> > > > > like to count the number of times each page is mapped and unmapped.
> > > > >
> > > > > So I'll go to the first alternative, that is also the proposed order
> > > > > of the RFC. What security issues do you expect beyond the comments in
> > > > > this series?
> > > >
> > > > If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ and
> > > > try to peek/modify it?
> > > >
> > >
> > > It works the same way as data vqs, we're just updating the device
> > > model in the middle. It should imply the exact same risk as updating
> > > an emulated NIC control plane (including vhost-kernel / vhost-user).
> >
> > Not sure I got you here. For vhost-kernel and vhost-user, CVQ's buffer
> > is owned by guests.
> >
>
> The same way they control the data plane when all data virtqueues are
> shadowed for dirty page tracking (more on the risk of qemu updating
> the device model below).

Ok.

>
> > But if we shadow CVQ without ASID, the CVQ buffer is owned by QEMU and
> > there's no way to prevent guests from accessing it?
> >
>
> With SVQ the memory exposed to the device is already shadowed. They
> cannot access the CVQ buffers memory the same way they cannot access
> the SVQ vrings.

Ok, I think I kind of get you, it looks like we have different
assumptions here: So if we only shadow CVQ, it will have security
issues, since RX/TX is not shadowed. If we shadow CVQ as well as
TX/RX, there's no security issue, since each IOVA is validated and the
descriptors are prepared by Qemu.

This goes back to another question, what's the order of the series.

Thanks


>
> > If in the case of vhost-kernel/vhost-user, there's a way for the guest
> > to exploit buffers owned by Qemu, it should be a bug.
> >
>
> The only extra step is the call to virtio_net_handle_ctrl_iov
> (extracted from virtio_net_handle_ctrl). If a guest can exploit that
> in SVQ mode, it can exploit it too with other vhost backends as far as
> I see.
>
> > Thanks
> >
> > >
> > > Roughly speaking, it's just to propose patches 01 to 03, with your
> > > comments. That already meets use cases like rx filter notifications
> > > for devices with only one ASID.
> > >
>
> This part of my mail is not correct, we need to add a few patches of
> this series on top :). If not, it would be exploitable.
>
> Thanks!
>
> > > Thanks!
> > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks!
> > > > >
> > > > > > Can we do as with previous base SVQ patches? they were merged although
> > > > > > there is still no way to enable SVQ.
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > > Thoughts?
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > TODO:
> > > > > > > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > > > > > > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > > > > > > >    cannot be migrated anymore, so some way of block it must be used.
> > > > > > > > * Review failure paths, some are with TODO notes, other don't.
> > > > > > > >
> > > > > > > > Changes from rfc v7:
> > > > > > > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > > > > > > >    more memory listeners.
> > > > > > > > * Move net backend start callback to SVQ.
> > > > > > > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > > > > > > * Changed ioctls, but they're provisional anyway.
> > > > > > > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > > > > > > * Usual cleaning: better tracing, doc, patches messages, ...
> > > > > > > >
> > > > > > > > Changes from rfc v6:
> > > > > > > > * Fix bad iotlb updates order when batching was enabled
> > > > > > > > * Add reference counting to iova_tree so cleaning is simpler.
> > > > > > > >
> > > > > > > > Changes from rfc v5:
> > > > > > > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > > > > > > >
> > > > > > > > Changes from rfc v4:
> > > > > > > > * Add missing tracing
> > > > > > > > * Add multiqueue support
> > > > > > > > * Use already sent version for replacing g_memdup
> > > > > > > > * Care with memory management
> > > > > > > >
> > > > > > > > Changes from rfc v3:
> > > > > > > > * Fix bad returning of descriptors to SVQ list.
> > > > > > > >
> > > > > > > > Changes from rfc v2:
> > > > > > > > * Fix use-after-free.
> > > > > > > >
> > > > > > > > Changes from rfc v1:
> > > > > > > > * Rebase to latest master.
> > > > > > > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > > > > > > * Update device model so (MAC) state can be migrated too.
> > > > > > > >
> > > > > > > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > > > > > > >
> > > > > > > > Eugenio Pérez (21):
> > > > > > > >    virtio-net: Expose ctrl virtqueue logic
> > > > > > > >    vhost: Add custom used buffer callback
> > > > > > > >    vdpa: control virtqueue support on shadow virtqueue
> > > > > > > >    virtio: Make virtqueue_alloc_element non-static
> > > > > > > >    vhost: Add vhost_iova_tree_find
> > > > > > > >    vdpa: Add map/unmap operation callback to SVQ
> > > > > > > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > > > > > > >    vhost: Add SVQElement
> > > > > > > >    vhost: Add svq copy desc mode
> > > > > > > >    vhost: Add vhost_svq_inject
> > > > > > > >    vhost: Update kernel headers
> > > > > > > >    vdpa: delay set_vring_ready after DRIVER_OK
> > > > > > > >    vhost: Add ShadowVirtQueueStart operation
> > > > > > > >    vhost: Make possible to check for device exclusive vq group
> > > > > > > >    vhost: add vhost_svq_poll
> > > > > > > >    vdpa: Add vhost_vdpa_start_control_svq
> > > > > > > >    vdpa: Add asid attribute to vdpa device
> > > > > > > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > > > > > > >    vhost: Add reference counting to vhost_iova_tree
> > > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > > >    vdpa: Add x-cvq-svq
> > > > > > > >
> > > > > > > >   qapi/net.json                                |  13 +-
> > > > > > > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > > > > > > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > > > > > > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > > > > > > >   include/hw/virtio/vhost.h                    |   3 +
> > > > > > > >   include/hw/virtio/virtio-net.h               |   4 +
> > > > > > > >   include/hw/virtio/virtio.h                   |   1 +
> > > > > > > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > > > > > > >   linux-headers/linux/vhost.h                  |  25 +-
> > > > > > > >   hw/net/vhost_net.c                           |   5 +-
> > > > > > > >   hw/net/virtio-net.c                          |  84 +++--
> > > > > > > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > > > > > > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > > > > > > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > > > > > > >   hw/virtio/virtio.c                           |   2 +-
> > > > > > > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > > > > > > >   hw/virtio/trace-events                       |  10 +-
> > > > > > > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>
Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
Posted by Eugenio Perez Martin 1 year, 10 months ago
On Wed, Jun 15, 2022 at 5:04 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Jun 14, 2022 at 5:32 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Tue, Jun 14, 2022 at 10:20 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Tue, Jun 14, 2022 at 4:14 PM Eugenio Perez Martin
> > > <eperezma@redhat.com> wrote:
> > > >
> > > > On Tue, Jun 14, 2022 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin
> > > > > <eperezma@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > > > >
> > > > > > > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > > > > > > Control virtqueue is used by networking device for accepting various
> > > > > > > > > commands from the driver. It's a must to support multiqueue and other
> > > > > > > > > configurations.
> > > > > > > > >
> > > > > > > > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > > > > > > > states, effectively intercepting them so qemu can track what regions of memory
> > > > > > > > > are dirty because device action and needs migration. However, this does not
> > > > > > > > > solve networking device state seen by the driver because CVQ messages, like
> > > > > > > > > changes on MAC addresses from the driver.
> > > > > > > > >
> > > > > > > > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > > > > > > > networking control messages used by the device. This way, qemu is able to
> > > > > > > > > update VirtIONet device model and to migrate it.
> > > > > > > > >
> > > > > > > > > However, to intercept all queues would slow device data forwarding. To solve
> > > > > > > > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > > > > > > > the ASID infraestructure, that allows different translations for different
> > > > > > > > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > > > > > > > >
> > > > > > > > > You can run qemu in two modes after applying this series: only intercepting
> > > > > > > > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > > > > > > > >
> > > > > > > > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > > > > > > > >
> > > > > > > > > First three patches enable the update of the virtio-net device model for each
> > > > > > > > > CVQ message acknoledged by the device.
> > > > > > > > >
> > > > > > > > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > > > > > > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > > > > > > > memory like in the data virtqueues.
> > > > > > > > >
> > > > > > > > > Patch 10 allows to inject control messages to the device. This allows to set
> > > > > > > > > state to the device both at QEMU startup and at live migration destination. In
> > > > > > > > > the future, this may also be used to emulate _F_ANNOUNCE.
> > > > > > > > >
> > > > > > > > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > > > > > > > because they are still not accepted in the kernel.
> > > > > > > > >
> > > > > > > > > Patches 12-16 enables the set of the features of the net device model to the
> > > > > > > > > vdpa device at device start.
> > > > > > > > >
> > > > > > > > > Last ones enables the sepparated ASID and SVQ.
> > > > > > > > >
> > > > > > > > > Comments are welcomed.
> > > > > > > >
> > > > > > > >
> > > > > > > > As discussed, I think we need to split this huge series into smaller ones:
> > > > > > > >
> > > > > > > > 1) shadow CVQ only, this makes rx-filter-event work
> > > > > > > > 2) ASID support for CVQ
> > > > > > > >
> > > > > > > > And for 1) we need consider whether or not it could be simplified.
> > > > > > > >
> > > > > > > > Or do it in reverse order, since if we do 1) first, we may have security
> > > > > > > > issues.
> > > > > > > >
> > > > > > >
> > > > > > > I'm ok with both, but I also think 2) before 1) might make more sense.
> > > > > > > There is no way to only shadow CVQ otherwise ATM.
> > > > > > >
> > > > > >
> > > > > > On second thought, that order is kind of harder.
> > > > > >
> > > > > > If we only map CVQ buffers, we need to either:
> > > > > > a. Copy them to controlled buffers
> > > > > > b. Track properly when to unmap them
> > > > >
> > > > > Just to make sure we're at the same page:
> > > > >
> > > > > I meant we can start with e.g having a dedicated ASID for CVQ but
> > > > > still using CVQ passthrough.
> > > > >
> > > >
> > > > That would imply duplicating all the memory listener updates to both
> > > > ASIDs. That part of the code needs to be reverted. I'm ok with that,
> > > > but I'm not sure if it's worth it to do it that way.
> > >
> > > I don't get why it is related to memory listeners. The only change is
> > >
> > > 1) read the groups
> > > 2) set cvq to be an independent asid
> > > 3) update CVQ's IOTLB with its own ASID
> > >
> >
> > How to track the mappings of step 3) without a copy?
>
> So let me try to explain, what I propose is to split the patches. So
> the above could be the first part. Since we know:
>
> 1) CVQ is passthrough to guest right now
> 2) We know CVQ will use an independent ASID
>
> It doesn't harm to implement those first. It's unrelated to the policy
> (e.g how to shadow CVQ).
>
> >
> > If we don't copy the buffers to qemu's IOVA, we need to track when to
> > unmap CVQ buffers memory. Many CVQ buffers could be in the same page,
> > so we need to refcount them (or similar solution).
>
> Can we use fixed mapping instead of the dynamic ones?
>

That implies either to implement something like a memory ring (size?),
or to effectively duplicate memory listener mappings.

I'm not against that, but it's something we need to remove on the
final solution. To use the order presented here will avoid that.

> >
> > This series copies the buffers to an independent buffer in qemu memory
> > to avoid that. Once you copy them, we have the problem you point at
> > some patch later: The guest control buffers, so qemu must understand
> > CVQ so the guest cannot trick it. All of this is orthogonal to ASID.
> > At that point, we have this series except for the asid part and the
> > injection of CVQ buffers at the LM destination, isn't it?
>
> So we have several stuffs:
>
> 1) ASID support
> 2) Shadow CVQ only
> 3) State restoring
>
> I hope we can split them into independent series. If we want to shadow
> CVQ first, we need to prove that it is safe without ASID.
>
> >
> > CVQ buffers can be copied in the qemu IOVA space and be offered to the
> > device. Much like SVQ vrings, the copied buffers will not be
> > accessible from the guest. The hw device (as "non emulated cvq") will
> > receive a lot of dma updates, but it's temporary. We can add ASID on
> > top of that as a mean to:
> > - Not to SVQ data plane (fundamental to the intended use case of vdpa).
> > - Not to pollute data plane DMA mappings.
> >
> > > ?
> > >
> > > >
> > > > > Then do other stuff on top.
> > > > >
> > > > > >
> > > > > > Alternative a. have the same problems exposed in this RFC: It's hard
> > > > > > (and unneeded in the final version) to know the size to copy.
> > > > > > Alternative b. also requires things not needed in the final version,
> > > > > > like to count the number of times each page is mapped and unmapped.
> > > > > >
> > > > > > So I'll go to the first alternative, that is also the proposed order
> > > > > > of the RFC. What security issues do you expect beyond the comments in
> > > > > > this series?
> > > > >
> > > > > If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ and
> > > > > try to peek/modify it?
> > > > >
> > > >
> > > > It works the same way as data vqs, we're just updating the device
> > > > model in the middle. It should imply the exact same risk as updating
> > > > an emulated NIC control plane (including vhost-kernel / vhost-user).
> > >
> > > Not sure I got you here. For vhost-kernel and vhost-user, CVQ's buffer
> > > is owned by guests.
> > >
> >
> > The same way they control the data plane when all data virtqueues are
> > shadowed for dirty page tracking (more on the risk of qemu updating
> > the device model below).
>
> Ok.
>
> >
> > > But if we shadow CVQ without ASID, the CVQ buffer is owned by QEMU and
> > > there's no way to prevent guests from accessing it?
> > >
> >
> > With SVQ the memory exposed to the device is already shadowed. They
> > cannot access the CVQ buffers memory the same way they cannot access
> > the SVQ vrings.
>
> Ok, I think I kind of get you, it looks like we have different
> assumptions here: So if we only shadow CVQ, it will have security
> issues, since RX/TX is not shadowed. If we shadow CVQ as well as
> TX/RX, there's no security issue, since each IOVA is validated and the
> descriptors are prepared by Qemu.
>

Right. I expected to maintain the all-shadowed-or-nothing behavior,
sorry if I was not clear.

> This goes back to another question, what's the order of the series.
>

I think that the shortest path is to follow the order of this series.
I tried to reorder your way, but ASID patches have to come with a lot
of CVQ patches if we want proper validation.

We can take the long route if we either implement a fixed ring buffer,
memory listener cloning, or another use case (sub-slicing?). But I
expect more issues to arise there.

I have another question actually, is it ok to implement the cvq use
case but not to merge the x-svq parameter? The more I think on the
parameter the more I see it's better to leave it as a separated patch
for testing until we shape the complete series and it's unneeded.

Thanks!

> Thanks
>
>
> >
> > > If in the case of vhost-kernel/vhost-user, there's a way for the guest
> > > to exploit buffers owned by Qemu, it should be a bug.
> > >
> >
> > The only extra step is the call to virtio_net_handle_ctrl_iov
> > (extracted from virtio_net_handle_ctrl). If a guest can exploit that
> > in SVQ mode, it can exploit it too with other vhost backends as far as
> > I see.
> >
> > > Thanks
> > >
> > > >
> > > > Roughly speaking, it's just to propose patches 01 to 03, with your
> > > > comments. That already meets use cases like rx filter notifications
> > > > for devices with only one ASID.
> > > >
> >
> > This part of my mail is not correct, we need to add a few patches of
> > this series on top :). If not, it would be exploitable.
> >
> > Thanks!
> >
> > > > Thanks!
> > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > > Can we do as with previous base SVQ patches? they were merged although
> > > > > > > there is still no way to enable SVQ.
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > > Thoughts?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > TODO:
> > > > > > > > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > > > > > > > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > > > > > > > >    cannot be migrated anymore, so some way of block it must be used.
> > > > > > > > > * Review failure paths, some are with TODO notes, other don't.
> > > > > > > > >
> > > > > > > > > Changes from rfc v7:
> > > > > > > > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > > > > > > > >    more memory listeners.
> > > > > > > > > * Move net backend start callback to SVQ.
> > > > > > > > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > > > > > > > * Changed ioctls, but they're provisional anyway.
> > > > > > > > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > > > > > > > * Usual cleaning: better tracing, doc, patches messages, ...
> > > > > > > > >
> > > > > > > > > Changes from rfc v6:
> > > > > > > > > * Fix bad iotlb updates order when batching was enabled
> > > > > > > > > * Add reference counting to iova_tree so cleaning is simpler.
> > > > > > > > >
> > > > > > > > > Changes from rfc v5:
> > > > > > > > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > > > > > > > >
> > > > > > > > > Changes from rfc v4:
> > > > > > > > > * Add missing tracing
> > > > > > > > > * Add multiqueue support
> > > > > > > > > * Use already sent version for replacing g_memdup
> > > > > > > > > * Care with memory management
> > > > > > > > >
> > > > > > > > > Changes from rfc v3:
> > > > > > > > > * Fix bad returning of descriptors to SVQ list.
> > > > > > > > >
> > > > > > > > > Changes from rfc v2:
> > > > > > > > > * Fix use-after-free.
> > > > > > > > >
> > > > > > > > > Changes from rfc v1:
> > > > > > > > > * Rebase to latest master.
> > > > > > > > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > > > > > > > * Update device model so (MAC) state can be migrated too.
> > > > > > > > >
> > > > > > > > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > > > > > > > >
> > > > > > > > > Eugenio Pérez (21):
> > > > > > > > >    virtio-net: Expose ctrl virtqueue logic
> > > > > > > > >    vhost: Add custom used buffer callback
> > > > > > > > >    vdpa: control virtqueue support on shadow virtqueue
> > > > > > > > >    virtio: Make virtqueue_alloc_element non-static
> > > > > > > > >    vhost: Add vhost_iova_tree_find
> > > > > > > > >    vdpa: Add map/unmap operation callback to SVQ
> > > > > > > > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > > > > > > > >    vhost: Add SVQElement
> > > > > > > > >    vhost: Add svq copy desc mode
> > > > > > > > >    vhost: Add vhost_svq_inject
> > > > > > > > >    vhost: Update kernel headers
> > > > > > > > >    vdpa: delay set_vring_ready after DRIVER_OK
> > > > > > > > >    vhost: Add ShadowVirtQueueStart operation
> > > > > > > > >    vhost: Make possible to check for device exclusive vq group
> > > > > > > > >    vhost: add vhost_svq_poll
> > > > > > > > >    vdpa: Add vhost_vdpa_start_control_svq
> > > > > > > > >    vdpa: Add asid attribute to vdpa device
> > > > > > > > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > > > > > > > >    vhost: Add reference counting to vhost_iova_tree
> > > > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > > > >    vdpa: Add x-cvq-svq
> > > > > > > > >
> > > > > > > > >   qapi/net.json                                |  13 +-
> > > > > > > > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > > > > > > > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > > > > > > > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > > > > > > > >   include/hw/virtio/vhost.h                    |   3 +
> > > > > > > > >   include/hw/virtio/virtio-net.h               |   4 +
> > > > > > > > >   include/hw/virtio/virtio.h                   |   1 +
> > > > > > > > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > > > > > > > >   linux-headers/linux/vhost.h                  |  25 +-
> > > > > > > > >   hw/net/vhost_net.c                           |   5 +-
> > > > > > > > >   hw/net/virtio-net.c                          |  84 +++--
> > > > > > > > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > > > > > > > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > > > > > > > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > > > > > > > >   hw/virtio/virtio.c                           |   2 +-
> > > > > > > > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > > > > > > > >   hw/virtio/trace-events                       |  10 +-
> > > > > > > > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
Posted by Jason Wang 1 year, 10 months ago
On Wed, Jun 15, 2022 at 6:03 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Wed, Jun 15, 2022 at 5:04 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Tue, Jun 14, 2022 at 5:32 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Tue, Jun 14, 2022 at 10:20 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Tue, Jun 14, 2022 at 4:14 PM Eugenio Perez Martin
> > > > <eperezma@redhat.com> wrote:
> > > > >
> > > > > On Tue, Jun 14, 2022 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin
> > > > > > <eperezma@redhat.com> wrote:
> > > > > > >
> > > > > > > On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > > > > > > > Control virtqueue is used by networking device for accepting various
> > > > > > > > > > commands from the driver. It's a must to support multiqueue and other
> > > > > > > > > > configurations.
> > > > > > > > > >
> > > > > > > > > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > > > > > > > > states, effectively intercepting them so qemu can track what regions of memory
> > > > > > > > > > are dirty because device action and needs migration. However, this does not
> > > > > > > > > > solve networking device state seen by the driver because CVQ messages, like
> > > > > > > > > > changes on MAC addresses from the driver.
> > > > > > > > > >
> > > > > > > > > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > > > > > > > > networking control messages used by the device. This way, qemu is able to
> > > > > > > > > > update VirtIONet device model and to migrate it.
> > > > > > > > > >
> > > > > > > > > > However, to intercept all queues would slow device data forwarding. To solve
> > > > > > > > > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > > > > > > > > the ASID infraestructure, that allows different translations for different
> > > > > > > > > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > > > > > > > > >
> > > > > > > > > > You can run qemu in two modes after applying this series: only intercepting
> > > > > > > > > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > > > > > > > > >
> > > > > > > > > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > > > > > > > > >
> > > > > > > > > > First three patches enable the update of the virtio-net device model for each
> > > > > > > > > > CVQ message acknoledged by the device.
> > > > > > > > > >
> > > > > > > > > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > > > > > > > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > > > > > > > > memory like in the data virtqueues.
> > > > > > > > > >
> > > > > > > > > > Patch 10 allows to inject control messages to the device. This allows to set
> > > > > > > > > > state to the device both at QEMU startup and at live migration destination. In
> > > > > > > > > > the future, this may also be used to emulate _F_ANNOUNCE.
> > > > > > > > > >
> > > > > > > > > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > > > > > > > > because they are still not accepted in the kernel.
> > > > > > > > > >
> > > > > > > > > > Patches 12-16 enables the set of the features of the net device model to the
> > > > > > > > > > vdpa device at device start.
> > > > > > > > > >
> > > > > > > > > > Last ones enables the sepparated ASID and SVQ.
> > > > > > > > > >
> > > > > > > > > > Comments are welcomed.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > As discussed, I think we need to split this huge series into smaller ones:
> > > > > > > > >
> > > > > > > > > 1) shadow CVQ only, this makes rx-filter-event work
> > > > > > > > > 2) ASID support for CVQ
> > > > > > > > >
> > > > > > > > > And for 1) we need consider whether or not it could be simplified.
> > > > > > > > >
> > > > > > > > > Or do it in reverse order, since if we do 1) first, we may have security
> > > > > > > > > issues.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I'm ok with both, but I also think 2) before 1) might make more sense.
> > > > > > > > There is no way to only shadow CVQ otherwise ATM.
> > > > > > > >
> > > > > > >
> > > > > > > On second thought, that order is kind of harder.
> > > > > > >
> > > > > > > If we only map CVQ buffers, we need to either:
> > > > > > > a. Copy them to controlled buffers
> > > > > > > b. Track properly when to unmap them
> > > > > >
> > > > > > Just to make sure we're at the same page:
> > > > > >
> > > > > > I meant we can start with e.g having a dedicated ASID for CVQ but
> > > > > > still using CVQ passthrough.
> > > > > >
> > > > >
> > > > > That would imply duplicating all the memory listener updates to both
> > > > > ASIDs. That part of the code needs to be reverted. I'm ok with that,
> > > > > but I'm not sure if it's worth it to do it that way.
> > > >
> > > > I don't get why it is related to memory listeners. The only change is
> > > >
> > > > 1) read the groups
> > > > 2) set cvq to be an independent asid
> > > > 3) update CVQ's IOTLB with its own ASID
> > > >
> > >
> > > How to track the mappings of step 3) without a copy?
> >
> > So let me try to explain, what I propose is to split the patches. So
> > the above could be the first part. Since we know:
> >
> > 1) CVQ is passthrough to guest right now
> > 2) We know CVQ will use an independent ASID
> >
> > It doesn't harm to implement those first. It's unrelated to the policy
> > (e.g how to shadow CVQ).
> >
> > >
> > > If we don't copy the buffers to qemu's IOVA, we need to track when to
> > > unmap CVQ buffers memory. Many CVQ buffers could be in the same page,
> > > so we need to refcount them (or similar solution).
> >
> > Can we use fixed mapping instead of the dynamic ones?
> >
>
> That implies either to implement something like a memory ring (size?),
> or to effectively duplicate memory listener mappings.

I'm not sure I get this.

But it's mainly the CVQ buffer + CVQ virtqueue.

It should be possible if:

1) allocate something like a buffer of several megabytes
2) only process one CVQ command from guest at once

?

>
> I'm not against that, but it's something we need to remove on the
> final solution. To use the order presented here will avoid that.
>
> > >
> > > This series copies the buffers to an independent buffer in qemu memory
> > > to avoid that. Once you copy them, we have the problem you point at
> > > some patch later: The guest control buffers, so qemu must understand
> > > CVQ so the guest cannot trick it. All of this is orthogonal to ASID.
> > > At that point, we have this series except for the asid part and the
> > > injection of CVQ buffers at the LM destination, isn't it?
> >
> > So we have several stuffs:
> >
> > 1) ASID support
> > 2) Shadow CVQ only
> > 3) State restoring
> >
> > I hope we can split them into independent series. If we want to shadow
> > CVQ first, we need to prove that it is safe without ASID.
> >
> > >
> > > CVQ buffers can be copied in the qemu IOVA space and be offered to the
> > > device. Much like SVQ vrings, the copied buffers will not be
> > > accessible from the guest. The hw device (as "non emulated cvq") will
> > > receive a lot of dma updates, but it's temporary. We can add ASID on
> > > top of that as a mean to:
> > > - Not to SVQ data plane (fundamental to the intended use case of vdpa).
> > > - Not to pollute data plane DMA mappings.
> > >
> > > > ?
> > > >
> > > > >
> > > > > > Then do other stuff on top.
> > > > > >
> > > > > > >
> > > > > > > Alternative a. have the same problems exposed in this RFC: It's hard
> > > > > > > (and unneeded in the final version) to know the size to copy.
> > > > > > > Alternative b. also requires things not needed in the final version,
> > > > > > > like to count the number of times each page is mapped and unmapped.
> > > > > > >
> > > > > > > So I'll go to the first alternative, that is also the proposed order
> > > > > > > of the RFC. What security issues do you expect beyond the comments in
> > > > > > > this series?
> > > > > >
> > > > > > If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ and
> > > > > > try to peek/modify it?
> > > > > >
> > > > >
> > > > > It works the same way as data vqs, we're just updating the device
> > > > > model in the middle. It should imply the exact same risk as updating
> > > > > an emulated NIC control plane (including vhost-kernel / vhost-user).
> > > >
> > > > Not sure I got you here. For vhost-kernel and vhost-user, CVQ's buffer
> > > > is owned by guests.
> > > >
> > >
> > > The same way they control the data plane when all data virtqueues are
> > > shadowed for dirty page tracking (more on the risk of qemu updating
> > > the device model below).
> >
> > Ok.
> >
> > >
> > > > But if we shadow CVQ without ASID, the CVQ buffer is owned by QEMU and
> > > > there's no way to prevent guests from accessing it?
> > > >
> > >
> > > With SVQ the memory exposed to the device is already shadowed. They
> > > cannot access the CVQ buffers memory the same way they cannot access
> > > the SVQ vrings.
> >
> > Ok, I think I kind of get you, it looks like we have different
> > assumptions here: So if we only shadow CVQ, it will have security
> > issues, since RX/TX is not shadowed. If we shadow CVQ as well as
> > TX/RX, there's no security issue, since each IOVA is validated and the
> > descriptors are prepared by Qemu.
> >
>
> Right. I expected to maintain the all-shadowed-or-nothing behavior,
> sorry if I was not clear.
>
> > This goes back to another question, what's the order of the series.
> >
>
> I think that the shortest path is to follow the order of this series.
> I tried to reorder your way, but ASID patches have to come with a lot
> of CVQ patches if we want proper validation.

Ok, so if this is the case, let's just split this series and keep the order.

>
> We can take the long route if we either implement a fixed ring buffer,
> memory listener cloning, or another use case (sub-slicing?). But I
> expect more issues to arise there.
>
> I have another question actually, is it ok to implement the cvq use
> case but not to merge the x-svq parameter? The more I think on the
> parameter the more I see it's better to leave it as a separated patch
> for testing until we shape the complete series and it's unneeded.

That's fine.

Thanks

>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > > > If in the case of vhost-kernel/vhost-user, there's a way for the guest
> > > > to exploit buffers owned by Qemu, it should be a bug.
> > > >
> > >
> > > The only extra step is the call to virtio_net_handle_ctrl_iov
> > > (extracted from virtio_net_handle_ctrl). If a guest can exploit that
> > > in SVQ mode, it can exploit it too with other vhost backends as far as
> > > I see.
> > >
> > > > Thanks
> > > >
> > > > >
> > > > > Roughly speaking, it's just to propose patches 01 to 03, with your
> > > > > comments. That already meets use cases like rx filter notifications
> > > > > for devices with only one ASID.
> > > > >
> > >
> > > This part of my mail is not correct, we need to add a few patches of
> > > this series on top :). If not, it would be exploitable.
> > >
> > > Thanks!
> > >
> > > > > Thanks!
> > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > > Can we do as with previous base SVQ patches? they were merged although
> > > > > > > > there is still no way to enable SVQ.
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > > Thoughts?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > TODO:
> > > > > > > > > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > > > > > > > > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > > > > > > > > >    cannot be migrated anymore, so some way of block it must be used.
> > > > > > > > > > * Review failure paths, some are with TODO notes, other don't.
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v7:
> > > > > > > > > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > > > > > > > > >    more memory listeners.
> > > > > > > > > > * Move net backend start callback to SVQ.
> > > > > > > > > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > > > > > > > > * Changed ioctls, but they're provisional anyway.
> > > > > > > > > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > > > > > > > > * Usual cleaning: better tracing, doc, patches messages, ...
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v6:
> > > > > > > > > > * Fix bad iotlb updates order when batching was enabled
> > > > > > > > > > * Add reference counting to iova_tree so cleaning is simpler.
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v5:
> > > > > > > > > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v4:
> > > > > > > > > > * Add missing tracing
> > > > > > > > > > * Add multiqueue support
> > > > > > > > > > * Use already sent version for replacing g_memdup
> > > > > > > > > > * Care with memory management
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v3:
> > > > > > > > > > * Fix bad returning of descriptors to SVQ list.
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v2:
> > > > > > > > > > * Fix use-after-free.
> > > > > > > > > >
> > > > > > > > > > Changes from rfc v1:
> > > > > > > > > > * Rebase to latest master.
> > > > > > > > > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > > > > > > > > * Update device model so (MAC) state can be migrated too.
> > > > > > > > > >
> > > > > > > > > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > > > > > > > > >
> > > > > > > > > > Eugenio Pérez (21):
> > > > > > > > > >    virtio-net: Expose ctrl virtqueue logic
> > > > > > > > > >    vhost: Add custom used buffer callback
> > > > > > > > > >    vdpa: control virtqueue support on shadow virtqueue
> > > > > > > > > >    virtio: Make virtqueue_alloc_element non-static
> > > > > > > > > >    vhost: Add vhost_iova_tree_find
> > > > > > > > > >    vdpa: Add map/unmap operation callback to SVQ
> > > > > > > > > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > > > > > > > > >    vhost: Add SVQElement
> > > > > > > > > >    vhost: Add svq copy desc mode
> > > > > > > > > >    vhost: Add vhost_svq_inject
> > > > > > > > > >    vhost: Update kernel headers
> > > > > > > > > >    vdpa: delay set_vring_ready after DRIVER_OK
> > > > > > > > > >    vhost: Add ShadowVirtQueueStart operation
> > > > > > > > > >    vhost: Make possible to check for device exclusive vq group
> > > > > > > > > >    vhost: add vhost_svq_poll
> > > > > > > > > >    vdpa: Add vhost_vdpa_start_control_svq
> > > > > > > > > >    vdpa: Add asid attribute to vdpa device
> > > > > > > > > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > > > > > > > > >    vhost: Add reference counting to vhost_iova_tree
> > > > > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > > > > >    vdpa: Add x-cvq-svq
> > > > > > > > > >
> > > > > > > > > >   qapi/net.json                                |  13 +-
> > > > > > > > > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > > > > > > > > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > > > > > > > > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > > > > > > > > >   include/hw/virtio/vhost.h                    |   3 +
> > > > > > > > > >   include/hw/virtio/virtio-net.h               |   4 +
> > > > > > > > > >   include/hw/virtio/virtio.h                   |   1 +
> > > > > > > > > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > > > > > > > > >   linux-headers/linux/vhost.h                  |  25 +-
> > > > > > > > > >   hw/net/vhost_net.c                           |   5 +-
> > > > > > > > > >   hw/net/virtio-net.c                          |  84 +++--
> > > > > > > > > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > > > > > > > > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > > > > > > > > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > > > > > > > > >   hw/virtio/virtio.c                           |   2 +-
> > > > > > > > > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > > > > > > > > >   hw/virtio/trace-events                       |  10 +-
> > > > > > > > > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
Posted by Eugenio Perez Martin 1 year, 10 months ago
On Fri, Jun 17, 2022 at 3:29 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Wed, Jun 15, 2022 at 6:03 PM Eugenio Perez Martin
> <eperezma@redhat.com> wrote:
> >
> > On Wed, Jun 15, 2022 at 5:04 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > > On Tue, Jun 14, 2022 at 5:32 PM Eugenio Perez Martin
> > > <eperezma@redhat.com> wrote:
> > > >
> > > > On Tue, Jun 14, 2022 at 10:20 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > >
> > > > > On Tue, Jun 14, 2022 at 4:14 PM Eugenio Perez Martin
> > > > > <eperezma@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Jun 14, 2022 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin
> > > > > > > <eperezma@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > > > > > > > > Control virtqueue is used by networking device for accepting various
> > > > > > > > > > > commands from the driver. It's a must to support multiqueue and other
> > > > > > > > > > > configurations.
> > > > > > > > > > >
> > > > > > > > > > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > > > > > > > > > states, effectively intercepting them so qemu can track what regions of memory
> > > > > > > > > > > are dirty because device action and needs migration. However, this does not
> > > > > > > > > > > solve networking device state seen by the driver because CVQ messages, like
> > > > > > > > > > > changes on MAC addresses from the driver.
> > > > > > > > > > >
> > > > > > > > > > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > > > > > > > > > networking control messages used by the device. This way, qemu is able to
> > > > > > > > > > > update VirtIONet device model and to migrate it.
> > > > > > > > > > >
> > > > > > > > > > > However, to intercept all queues would slow device data forwarding. To solve
> > > > > > > > > > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > > > > > > > > > the ASID infraestructure, that allows different translations for different
> > > > > > > > > > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > > > > > > > > > >
> > > > > > > > > > > You can run qemu in two modes after applying this series: only intercepting
> > > > > > > > > > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > > > > > > > > > >
> > > > > > > > > > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > > > > > > > > > >
> > > > > > > > > > > First three patches enable the update of the virtio-net device model for each
> > > > > > > > > > > CVQ message acknoledged by the device.
> > > > > > > > > > >
> > > > > > > > > > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > > > > > > > > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > > > > > > > > > memory like in the data virtqueues.
> > > > > > > > > > >
> > > > > > > > > > > Patch 10 allows to inject control messages to the device. This allows to set
> > > > > > > > > > > state to the device both at QEMU startup and at live migration destination. In
> > > > > > > > > > > the future, this may also be used to emulate _F_ANNOUNCE.
> > > > > > > > > > >
> > > > > > > > > > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > > > > > > > > > because they are still not accepted in the kernel.
> > > > > > > > > > >
> > > > > > > > > > > Patches 12-16 enables the set of the features of the net device model to the
> > > > > > > > > > > vdpa device at device start.
> > > > > > > > > > >
> > > > > > > > > > > Last ones enables the sepparated ASID and SVQ.
> > > > > > > > > > >
> > > > > > > > > > > Comments are welcomed.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > As discussed, I think we need to split this huge series into smaller ones:
> > > > > > > > > >
> > > > > > > > > > 1) shadow CVQ only, this makes rx-filter-event work
> > > > > > > > > > 2) ASID support for CVQ
> > > > > > > > > >
> > > > > > > > > > And for 1) we need consider whether or not it could be simplified.
> > > > > > > > > >
> > > > > > > > > > Or do it in reverse order, since if we do 1) first, we may have security
> > > > > > > > > > issues.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I'm ok with both, but I also think 2) before 1) might make more sense.
> > > > > > > > > There is no way to only shadow CVQ otherwise ATM.
> > > > > > > > >
> > > > > > > >
> > > > > > > > On second thought, that order is kind of harder.
> > > > > > > >
> > > > > > > > If we only map CVQ buffers, we need to either:
> > > > > > > > a. Copy them to controlled buffers
> > > > > > > > b. Track properly when to unmap them
> > > > > > >
> > > > > > > Just to make sure we're at the same page:
> > > > > > >
> > > > > > > I meant we can start with e.g having a dedicated ASID for CVQ but
> > > > > > > still using CVQ passthrough.
> > > > > > >
> > > > > >
> > > > > > That would imply duplicating all the memory listener updates to both
> > > > > > ASIDs. That part of the code needs to be reverted. I'm ok with that,
> > > > > > but I'm not sure if it's worth it to do it that way.
> > > > >
> > > > > I don't get why it is related to memory listeners. The only change is
> > > > >
> > > > > 1) read the groups
> > > > > 2) set cvq to be an independent asid
> > > > > 3) update CVQ's IOTLB with its own ASID
> > > > >
> > > >
> > > > How to track the mappings of step 3) without a copy?
> > >
> > > So let me try to explain, what I propose is to split the patches. So
> > > the above could be the first part. Since we know:
> > >
> > > 1) CVQ is passthrough to guest right now
> > > 2) We know CVQ will use an independent ASID
> > >
> > > It doesn't harm to implement those first. It's unrelated to the policy
> > > (e.g how to shadow CVQ).
> > >
> > > >
> > > > If we don't copy the buffers to qemu's IOVA, we need to track when to
> > > > unmap CVQ buffers memory. Many CVQ buffers could be in the same page,
> > > > so we need to refcount them (or similar solution).
> > >
> > > Can we use fixed mapping instead of the dynamic ones?
> > >
> >
> > That implies either to implement something like a memory ring (size?),
> > or to effectively duplicate memory listener mappings.
>
> I'm not sure I get this.
>
> But it's mainly the CVQ buffer + CVQ virtqueue.
>
> It should be possible if:
>
> 1) allocate something like a buffer of several megabytes

It's technically possible, but we need to deal with situations that do
not happen in the final version once we teach qemu how to deal with
CVQ. For example, what do we do if it does not fit?

Current workflow deals with it automatically, as we teach qemu about
CVQ before splitting it to a separated ASID. The big buffer looks like
a good *transversal* optimization to me. For example, when indirect
descriptors are supported, we will need something like that to not
abuse map/unmap ops. CVQ can use it too. But it will be better if we
provide it with a good default + tunable IMO.

> 2) only process one CVQ command from guest at once
>

I don't get why it's needed, it's to make sure CVQ never fills that
buffer? It should be easy to copy as many guest's CVQ buffers as
possible there and then stop when it's full.

> ?
>
> >
> > I'm not against that, but it's something we need to remove on the
> > final solution. To use the order presented here will avoid that.
> >
> > > >
> > > > This series copies the buffers to an independent buffer in qemu memory
> > > > to avoid that. Once you copy them, we have the problem you point at
> > > > some patch later: The guest control buffers, so qemu must understand
> > > > CVQ so the guest cannot trick it. All of this is orthogonal to ASID.
> > > > At that point, we have this series except for the asid part and the
> > > > injection of CVQ buffers at the LM destination, isn't it?
> > >
> > > So we have several stuffs:
> > >
> > > 1) ASID support
> > > 2) Shadow CVQ only
> > > 3) State restoring
> > >
> > > I hope we can split them into independent series. If we want to shadow
> > > CVQ first, we need to prove that it is safe without ASID.
> > >
> > > >
> > > > CVQ buffers can be copied in the qemu IOVA space and be offered to the
> > > > device. Much like SVQ vrings, the copied buffers will not be
> > > > accessible from the guest. The hw device (as "non emulated cvq") will
> > > > receive a lot of dma updates, but it's temporary. We can add ASID on
> > > > top of that as a mean to:
> > > > - Not to SVQ data plane (fundamental to the intended use case of vdpa).
> > > > - Not to pollute data plane DMA mappings.
> > > >
> > > > > ?
> > > > >
> > > > > >
> > > > > > > Then do other stuff on top.
> > > > > > >
> > > > > > > >
> > > > > > > > Alternative a. have the same problems exposed in this RFC: It's hard
> > > > > > > > (and unneeded in the final version) to know the size to copy.
> > > > > > > > Alternative b. also requires things not needed in the final version,
> > > > > > > > like to count the number of times each page is mapped and unmapped.
> > > > > > > >
> > > > > > > > So I'll go to the first alternative, that is also the proposed order
> > > > > > > > of the RFC. What security issues do you expect beyond the comments in
> > > > > > > > this series?
> > > > > > >
> > > > > > > If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ and
> > > > > > > try to peek/modify it?
> > > > > > >
> > > > > >
> > > > > > It works the same way as data vqs, we're just updating the device
> > > > > > model in the middle. It should imply the exact same risk as updating
> > > > > > an emulated NIC control plane (including vhost-kernel / vhost-user).
> > > > >
> > > > > Not sure I got you here. For vhost-kernel and vhost-user, CVQ's buffer
> > > > > is owned by guests.
> > > > >
> > > >
> > > > The same way they control the data plane when all data virtqueues are
> > > > shadowed for dirty page tracking (more on the risk of qemu updating
> > > > the device model below).
> > >
> > > Ok.
> > >
> > > >
> > > > > But if we shadow CVQ without ASID, the CVQ buffer is owned by QEMU and
> > > > > there's no way to prevent guests from accessing it?
> > > > >
> > > >
> > > > With SVQ the memory exposed to the device is already shadowed. They
> > > > cannot access the CVQ buffers memory the same way they cannot access
> > > > the SVQ vrings.
> > >
> > > Ok, I think I kind of get you, it looks like we have different
> > > assumptions here: So if we only shadow CVQ, it will have security
> > > issues, since RX/TX is not shadowed. If we shadow CVQ as well as
> > > TX/RX, there's no security issue, since each IOVA is validated and the
> > > descriptors are prepared by Qemu.
> > >
> >
> > Right. I expected to maintain the all-shadowed-or-nothing behavior,
> > sorry if I was not clear.
> >
> > > This goes back to another question, what's the order of the series.
> > >
> >
> > I think that the shortest path is to follow the order of this series.
> > I tried to reorder your way, but ASID patches have to come with a lot
> > of CVQ patches if we want proper validation.
>
> Ok, so if this is the case, let's just split this series and keep the order.
>
> >
> > We can take the long route if we either implement a fixed ring buffer,
> > memory listener cloning, or another use case (sub-slicing?). But I
> > expect more issues to arise there.
> >
> > I have another question actually, is it ok to implement the cvq use
> > case but not to merge the x-svq parameter? The more I think on the
> > parameter the more I see it's better to leave it as a separated patch
> > for testing until we shape the complete series and it's unneeded.
>
> That's fine.
>
> Thanks
>
> >
> > Thanks!
> >
> > > Thanks
> > >
> > >
> > > >
> > > > > If in the case of vhost-kernel/vhost-user, there's a way for the guest
> > > > > to exploit buffers owned by Qemu, it should be a bug.
> > > > >
> > > >
> > > > The only extra step is the call to virtio_net_handle_ctrl_iov
> > > > (extracted from virtio_net_handle_ctrl). If a guest can exploit that
> > > > in SVQ mode, it can exploit it too with other vhost backends as far as
> > > > I see.
> > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Roughly speaking, it's just to propose patches 01 to 03, with your
> > > > > > comments. That already meets use cases like rx filter notifications
> > > > > > for devices with only one ASID.
> > > > > >
> > > >
> > > > This part of my mail is not correct, we need to add a few patches of
> > > > this series on top :). If not, it would be exploitable.
> > > >
> > > > Thanks!
> > > >
> > > > > > Thanks!
> > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > > Can we do as with previous base SVQ patches? they were merged although
> > > > > > > > > there is still no way to enable SVQ.
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > > > Thoughts?
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > TODO:
> > > > > > > > > > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > > > > > > > > > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > > > > > > > > > >    cannot be migrated anymore, so some way of block it must be used.
> > > > > > > > > > > * Review failure paths, some are with TODO notes, other don't.
> > > > > > > > > > >
> > > > > > > > > > > Changes from rfc v7:
> > > > > > > > > > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > > > > > > > > > >    more memory listeners.
> > > > > > > > > > > * Move net backend start callback to SVQ.
> > > > > > > > > > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > > > > > > > > > * Changed ioctls, but they're provisional anyway.
> > > > > > > > > > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > > > > > > > > > * Usual cleaning: better tracing, doc, patches messages, ...
> > > > > > > > > > >
> > > > > > > > > > > Changes from rfc v6:
> > > > > > > > > > > * Fix bad iotlb updates order when batching was enabled
> > > > > > > > > > > * Add reference counting to iova_tree so cleaning is simpler.
> > > > > > > > > > >
> > > > > > > > > > > Changes from rfc v5:
> > > > > > > > > > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > > > > > > > > > >
> > > > > > > > > > > Changes from rfc v4:
> > > > > > > > > > > * Add missing tracing
> > > > > > > > > > > * Add multiqueue support
> > > > > > > > > > > * Use already sent version for replacing g_memdup
> > > > > > > > > > > * Care with memory management
> > > > > > > > > > >
> > > > > > > > > > > Changes from rfc v3:
> > > > > > > > > > > * Fix bad returning of descriptors to SVQ list.
> > > > > > > > > > >
> > > > > > > > > > > Changes from rfc v2:
> > > > > > > > > > > * Fix use-after-free.
> > > > > > > > > > >
> > > > > > > > > > > Changes from rfc v1:
> > > > > > > > > > > * Rebase to latest master.
> > > > > > > > > > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > > > > > > > > > * Update device model so (MAC) state can be migrated too.
> > > > > > > > > > >
> > > > > > > > > > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > > > > > > > > > >
> > > > > > > > > > > Eugenio Pérez (21):
> > > > > > > > > > >    virtio-net: Expose ctrl virtqueue logic
> > > > > > > > > > >    vhost: Add custom used buffer callback
> > > > > > > > > > >    vdpa: control virtqueue support on shadow virtqueue
> > > > > > > > > > >    virtio: Make virtqueue_alloc_element non-static
> > > > > > > > > > >    vhost: Add vhost_iova_tree_find
> > > > > > > > > > >    vdpa: Add map/unmap operation callback to SVQ
> > > > > > > > > > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > > > > > > > > > >    vhost: Add SVQElement
> > > > > > > > > > >    vhost: Add svq copy desc mode
> > > > > > > > > > >    vhost: Add vhost_svq_inject
> > > > > > > > > > >    vhost: Update kernel headers
> > > > > > > > > > >    vdpa: delay set_vring_ready after DRIVER_OK
> > > > > > > > > > >    vhost: Add ShadowVirtQueueStart operation
> > > > > > > > > > >    vhost: Make possible to check for device exclusive vq group
> > > > > > > > > > >    vhost: add vhost_svq_poll
> > > > > > > > > > >    vdpa: Add vhost_vdpa_start_control_svq
> > > > > > > > > > >    vdpa: Add asid attribute to vdpa device
> > > > > > > > > > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > > > > > > > > > >    vhost: Add reference counting to vhost_iova_tree
> > > > > > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > > > > > >    vdpa: Add x-cvq-svq
> > > > > > > > > > >
> > > > > > > > > > >   qapi/net.json                                |  13 +-
> > > > > > > > > > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > > > > > > > > > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > > > > > > > > > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > > > > > > > > > >   include/hw/virtio/vhost.h                    |   3 +
> > > > > > > > > > >   include/hw/virtio/virtio-net.h               |   4 +
> > > > > > > > > > >   include/hw/virtio/virtio.h                   |   1 +
> > > > > > > > > > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > > > > > > > > > >   linux-headers/linux/vhost.h                  |  25 +-
> > > > > > > > > > >   hw/net/vhost_net.c                           |   5 +-
> > > > > > > > > > >   hw/net/virtio-net.c                          |  84 +++--
> > > > > > > > > > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > > > > > > > > > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > > > > > > > > > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > > > > > > > > > >   hw/virtio/virtio.c                           |   2 +-
> > > > > > > > > > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > > > > > > > > > >   hw/virtio/trace-events                       |  10 +-
> > > > > > > > > > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Re: [RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ
Posted by Jason Wang 1 year, 10 months ago
On Fri, Jun 17, 2022 at 4:17 PM Eugenio Perez Martin
<eperezma@redhat.com> wrote:
>
> On Fri, Jun 17, 2022 at 3:29 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Jun 15, 2022 at 6:03 PM Eugenio Perez Martin
> > <eperezma@redhat.com> wrote:
> > >
> > > On Wed, Jun 15, 2022 at 5:04 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Tue, Jun 14, 2022 at 5:32 PM Eugenio Perez Martin
> > > > <eperezma@redhat.com> wrote:
> > > > >
> > > > > On Tue, Jun 14, 2022 at 10:20 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Jun 14, 2022 at 4:14 PM Eugenio Perez Martin
> > > > > > <eperezma@redhat.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jun 14, 2022 at 10:02 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jun 14, 2022 at 12:32 AM Eugenio Perez Martin
> > > > > > > > <eperezma@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Jun 8, 2022 at 9:28 PM Eugenio Perez Martin <eperezma@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 8, 2022 at 7:51 AM Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 在 2022/5/20 03:12, Eugenio Pérez 写道:
> > > > > > > > > > > > Control virtqueue is used by networking device for accepting various
> > > > > > > > > > > > commands from the driver. It's a must to support multiqueue and other
> > > > > > > > > > > > configurations.
> > > > > > > > > > > >
> > > > > > > > > > > > Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
> > > > > > > > > > > > states, effectively intercepting them so qemu can track what regions of memory
> > > > > > > > > > > > are dirty because device action and needs migration. However, this does not
> > > > > > > > > > > > solve networking device state seen by the driver because CVQ messages, like
> > > > > > > > > > > > changes on MAC addresses from the driver.
> > > > > > > > > > > >
> > > > > > > > > > > > To solve that, this series uses SVQ infraestructure proposed to intercept
> > > > > > > > > > > > networking control messages used by the device. This way, qemu is able to
> > > > > > > > > > > > update VirtIONet device model and to migrate it.
> > > > > > > > > > > >
> > > > > > > > > > > > However, to intercept all queues would slow device data forwarding. To solve
> > > > > > > > > > > > that, only the CVQ must be intercepted all the time. This is achieved using
> > > > > > > > > > > > the ASID infraestructure, that allows different translations for different
> > > > > > > > > > > > virtqueues. The most updated kernel part of ASID is proposed at [1].
> > > > > > > > > > > >
> > > > > > > > > > > > You can run qemu in two modes after applying this series: only intercepting
> > > > > > > > > > > > cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:
> > > > > > > > > > > >
> > > > > > > > > > > > -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on
> > > > > > > > > > > >
> > > > > > > > > > > > First three patches enable the update of the virtio-net device model for each
> > > > > > > > > > > > CVQ message acknoledged by the device.
> > > > > > > > > > > >
> > > > > > > > > > > > Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
> > > > > > > > > > > > This allows simplyfing the memory mapping, instead of map all the guest's
> > > > > > > > > > > > memory like in the data virtqueues.
> > > > > > > > > > > >
> > > > > > > > > > > > Patch 10 allows to inject control messages to the device. This allows to set
> > > > > > > > > > > > state to the device both at QEMU startup and at live migration destination. In
> > > > > > > > > > > > the future, this may also be used to emulate _F_ANNOUNCE.
> > > > > > > > > > > >
> > > > > > > > > > > > Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
> > > > > > > > > > > > because they are still not accepted in the kernel.
> > > > > > > > > > > >
> > > > > > > > > > > > Patches 12-16 enables the set of the features of the net device model to the
> > > > > > > > > > > > vdpa device at device start.
> > > > > > > > > > > >
> > > > > > > > > > > > Last ones enables the sepparated ASID and SVQ.
> > > > > > > > > > > >
> > > > > > > > > > > > Comments are welcomed.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > As discussed, I think we need to split this huge series into smaller ones:
> > > > > > > > > > >
> > > > > > > > > > > 1) shadow CVQ only, this makes rx-filter-event work
> > > > > > > > > > > 2) ASID support for CVQ
> > > > > > > > > > >
> > > > > > > > > > > And for 1) we need consider whether or not it could be simplified.
> > > > > > > > > > >
> > > > > > > > > > > Or do it in reverse order, since if we do 1) first, we may have security
> > > > > > > > > > > issues.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I'm ok with both, but I also think 2) before 1) might make more sense.
> > > > > > > > > > There is no way to only shadow CVQ otherwise ATM.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > On second thought, that order is kind of harder.
> > > > > > > > >
> > > > > > > > > If we only map CVQ buffers, we need to either:
> > > > > > > > > a. Copy them to controlled buffers
> > > > > > > > > b. Track properly when to unmap them
> > > > > > > >
> > > > > > > > Just to make sure we're at the same page:
> > > > > > > >
> > > > > > > > I meant we can start with e.g having a dedicated ASID for CVQ but
> > > > > > > > still using CVQ passthrough.
> > > > > > > >
> > > > > > >
> > > > > > > That would imply duplicating all the memory listener updates to both
> > > > > > > ASIDs. That part of the code needs to be reverted. I'm ok with that,
> > > > > > > but I'm not sure if it's worth it to do it that way.
> > > > > >
> > > > > > I don't get why it is related to memory listeners. The only change is
> > > > > >
> > > > > > 1) read the groups
> > > > > > 2) set cvq to be an independent asid
> > > > > > 3) update CVQ's IOTLB with its own ASID
> > > > > >
> > > > >
> > > > > How to track the mappings of step 3) without a copy?
> > > >
> > > > So let me try to explain, what I propose is to split the patches. So
> > > > the above could be the first part. Since we know:
> > > >
> > > > 1) CVQ is passthrough to guest right now
> > > > 2) We know CVQ will use an independent ASID
> > > >
> > > > It doesn't harm to implement those first. It's unrelated to the policy
> > > > (e.g how to shadow CVQ).
> > > >
> > > > >
> > > > > If we don't copy the buffers to qemu's IOVA, we need to track when to
> > > > > unmap CVQ buffers memory. Many CVQ buffers could be in the same page,
> > > > > so we need to refcount them (or similar solution).
> > > >
> > > > Can we use fixed mapping instead of the dynamic ones?
> > > >
> > >
> > > That implies either to implement something like a memory ring (size?),
> > > or to effectively duplicate memory listener mappings.
> >
> > I'm not sure I get this.
> >
> > But it's mainly the CVQ buffer + CVQ virtqueue.
> >
> > It should be possible if:
> >
> > 1) allocate something like a buffer of several megabytes
>
> It's technically possible, but we need to deal with situations that do
> not happen in the final version once we teach qemu how to deal with
> CVQ. For example, what do we do if it does not fit?

Then double the size of the area? For CVQ, Qemu should know the
maximum size of the request, otherwise it would be another blocker for
live migration.

>
> Current workflow deals with it automatically, as we teach qemu about
> CVQ before splitting it to a separated ASID. The big buffer looks like
> a good *transversal* optimization to me. For example, when indirect
> descriptors are supported, we will need something like that to not
> abuse map/unmap ops. CVQ can use it too. But it will be better if we
> provide it with a good default + tunable IMO.

That's fine.

>
> > 2) only process one CVQ command from guest at once
> >
>
> I don't get why it's needed, it's to make sure CVQ never fills that
> buffer? It should be easy to copy as many guest's CVQ buffers as
> possible there and then stop when it's full.

It's not a must, just a proposal to start from something that is simpler ...

Thanks

>
> > ?
> >
> > >
> > > I'm not against that, but it's something we need to remove on the
> > > final solution. To use the order presented here will avoid that.
> > >
> > > > >
> > > > > This series copies the buffers to an independent buffer in qemu memory
> > > > > to avoid that. Once you copy them, we have the problem you point at
> > > > > some patch later: The guest control buffers, so qemu must understand
> > > > > CVQ so the guest cannot trick it. All of this is orthogonal to ASID.
> > > > > At that point, we have this series except for the asid part and the
> > > > > injection of CVQ buffers at the LM destination, isn't it?
> > > >
> > > > So we have several stuffs:
> > > >
> > > > 1) ASID support
> > > > 2) Shadow CVQ only
> > > > 3) State restoring
> > > >
> > > > I hope we can split them into independent series. If we want to shadow
> > > > CVQ first, we need to prove that it is safe without ASID.
> > > >
> > > > >
> > > > > CVQ buffers can be copied in the qemu IOVA space and be offered to the
> > > > > device. Much like SVQ vrings, the copied buffers will not be
> > > > > accessible from the guest. The hw device (as "non emulated cvq") will
> > > > > receive a lot of dma updates, but it's temporary. We can add ASID on
> > > > > top of that as a mean to:
> > > > > - Not to SVQ data plane (fundamental to the intended use case of vdpa).
> > > > > - Not to pollute data plane DMA mappings.
> > > > >
> > > > > > ?
> > > > > >
> > > > > > >
> > > > > > > > Then do other stuff on top.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Alternative a. have the same problems exposed in this RFC: It's hard
> > > > > > > > > (and unneeded in the final version) to know the size to copy.
> > > > > > > > > Alternative b. also requires things not needed in the final version,
> > > > > > > > > like to count the number of times each page is mapped and unmapped.
> > > > > > > > >
> > > > > > > > > So I'll go to the first alternative, that is also the proposed order
> > > > > > > > > of the RFC. What security issues do you expect beyond the comments in
> > > > > > > > > this series?
> > > > > > > >
> > > > > > > > If we shadow CVQ without ASID. The guest may guess the IOVA of CVQ and
> > > > > > > > try to peek/modify it?
> > > > > > > >
> > > > > > >
> > > > > > > It works the same way as data vqs, we're just updating the device
> > > > > > > model in the middle. It should imply the exact same risk as updating
> > > > > > > an emulated NIC control plane (including vhost-kernel / vhost-user).
> > > > > >
> > > > > > Not sure I got you here. For vhost-kernel and vhost-user, CVQ's buffer
> > > > > > is owned by guests.
> > > > > >
> > > > >
> > > > > The same way they control the data plane when all data virtqueues are
> > > > > shadowed for dirty page tracking (more on the risk of qemu updating
> > > > > the device model below).
> > > >
> > > > Ok.
> > > >
> > > > >
> > > > > > But if we shadow CVQ without ASID, the CVQ buffer is owned by QEMU and
> > > > > > there's no way to prevent guests from accessing it?
> > > > > >
> > > > >
> > > > > With SVQ the memory exposed to the device is already shadowed. They
> > > > > cannot access the CVQ buffers memory the same way they cannot access
> > > > > the SVQ vrings.
> > > >
> > > > Ok, I think I kind of get you, it looks like we have different
> > > > assumptions here: So if we only shadow CVQ, it will have security
> > > > issues, since RX/TX is not shadowed. If we shadow CVQ as well as
> > > > TX/RX, there's no security issue, since each IOVA is validated and the
> > > > descriptors are prepared by Qemu.
> > > >
> > >
> > > Right. I expected to maintain the all-shadowed-or-nothing behavior,
> > > sorry if I was not clear.
> > >
> > > > This goes back to another question, what's the order of the series.
> > > >
> > >
> > > I think that the shortest path is to follow the order of this series.
> > > I tried to reorder your way, but ASID patches have to come with a lot
> > > of CVQ patches if we want proper validation.
> >
> > Ok, so if this is the case, let's just split this series and keep the order.
> >
> > >
> > > We can take the long route if we either implement a fixed ring buffer,
> > > memory listener cloning, or another use case (sub-slicing?). But I
> > > expect more issues to arise there.
> > >
> > > I have another question actually, is it ok to implement the cvq use
> > > case but not to merge the x-svq parameter? The more I think on the
> > > parameter the more I see it's better to leave it as a separated patch
> > > for testing until we shape the complete series and it's unneeded.
> >
> > That's fine.
> >
> > Thanks
> >
> > >
> > > Thanks!
> > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > > > If in the case of vhost-kernel/vhost-user, there's a way for the guest
> > > > > > to exploit buffers owned by Qemu, it should be a bug.
> > > > > >
> > > > >
> > > > > The only extra step is the call to virtio_net_handle_ctrl_iov
> > > > > (extracted from virtio_net_handle_ctrl). If a guest can exploit that
> > > > > in SVQ mode, it can exploit it too with other vhost backends as far as
> > > > > I see.
> > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Roughly speaking, it's just to propose patches 01 to 03, with your
> > > > > > > comments. That already meets use cases like rx filter notifications
> > > > > > > for devices with only one ASID.
> > > > > > >
> > > > >
> > > > > This part of my mail is not correct, we need to add a few patches of
> > > > > this series on top :). If not, it would be exploitable.
> > > > >
> > > > > Thanks!
> > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > > > Can we do as with previous base SVQ patches? they were merged although
> > > > > > > > > > there is still no way to enable SVQ.
> > > > > > > > > >
> > > > > > > > > > Thanks!
> > > > > > > > > >
> > > > > > > > > > > Thoughts?
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > TODO:
> > > > > > > > > > > > * Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
> > > > > > > > > > > >    reason, blocking migration. This is tricky, since it can cause that the VM
> > > > > > > > > > > >    cannot be migrated anymore, so some way of block it must be used.
> > > > > > > > > > > > * Review failure paths, some are with TODO notes, other don't.
> > > > > > > > > > > >
> > > > > > > > > > > > Changes from rfc v7:
> > > > > > > > > > > > * Don't map all guest space in ASID 1 but copy all the buffers. No need for
> > > > > > > > > > > >    more memory listeners.
> > > > > > > > > > > > * Move net backend start callback to SVQ.
> > > > > > > > > > > > * Wait for device CVQ commands used by the device at SVQ start, avoiding races.
> > > > > > > > > > > > * Changed ioctls, but they're provisional anyway.
> > > > > > > > > > > > * Reorder commits so refactor and code adding ones are closer to usage.
> > > > > > > > > > > > * Usual cleaning: better tracing, doc, patches messages, ...
> > > > > > > > > > > >
> > > > > > > > > > > > Changes from rfc v6:
> > > > > > > > > > > > * Fix bad iotlb updates order when batching was enabled
> > > > > > > > > > > > * Add reference counting to iova_tree so cleaning is simpler.
> > > > > > > > > > > >
> > > > > > > > > > > > Changes from rfc v5:
> > > > > > > > > > > > * Fixes bad calculus of cvq end group when MQ is not acked by the guest.
> > > > > > > > > > > >
> > > > > > > > > > > > Changes from rfc v4:
> > > > > > > > > > > > * Add missing tracing
> > > > > > > > > > > > * Add multiqueue support
> > > > > > > > > > > > * Use already sent version for replacing g_memdup
> > > > > > > > > > > > * Care with memory management
> > > > > > > > > > > >
> > > > > > > > > > > > Changes from rfc v3:
> > > > > > > > > > > > * Fix bad returning of descriptors to SVQ list.
> > > > > > > > > > > >
> > > > > > > > > > > > Changes from rfc v2:
> > > > > > > > > > > > * Fix use-after-free.
> > > > > > > > > > > >
> > > > > > > > > > > > Changes from rfc v1:
> > > > > > > > > > > > * Rebase to latest master.
> > > > > > > > > > > > * Configure ASID instead of assuming cvq asid != data vqs asid.
> > > > > > > > > > > > * Update device model so (MAC) state can be migrated too.
> > > > > > > > > > > >
> > > > > > > > > > > > [1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gdawar@xilinx.com/
> > > > > > > > > > > >
> > > > > > > > > > > > Eugenio Pérez (21):
> > > > > > > > > > > >    virtio-net: Expose ctrl virtqueue logic
> > > > > > > > > > > >    vhost: Add custom used buffer callback
> > > > > > > > > > > >    vdpa: control virtqueue support on shadow virtqueue
> > > > > > > > > > > >    virtio: Make virtqueue_alloc_element non-static
> > > > > > > > > > > >    vhost: Add vhost_iova_tree_find
> > > > > > > > > > > >    vdpa: Add map/unmap operation callback to SVQ
> > > > > > > > > > > >    vhost: move descriptor translation to vhost_svq_vring_write_descs
> > > > > > > > > > > >    vhost: Add SVQElement
> > > > > > > > > > > >    vhost: Add svq copy desc mode
> > > > > > > > > > > >    vhost: Add vhost_svq_inject
> > > > > > > > > > > >    vhost: Update kernel headers
> > > > > > > > > > > >    vdpa: delay set_vring_ready after DRIVER_OK
> > > > > > > > > > > >    vhost: Add ShadowVirtQueueStart operation
> > > > > > > > > > > >    vhost: Make possible to check for device exclusive vq group
> > > > > > > > > > > >    vhost: add vhost_svq_poll
> > > > > > > > > > > >    vdpa: Add vhost_vdpa_start_control_svq
> > > > > > > > > > > >    vdpa: Add asid attribute to vdpa device
> > > > > > > > > > > >    vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
> > > > > > > > > > > >    vhost: Add reference counting to vhost_iova_tree
> > > > > > > > > > > >    vdpa: Add x-svq to NetdevVhostVDPAOptions
> > > > > > > > > > > >    vdpa: Add x-cvq-svq
> > > > > > > > > > > >
> > > > > > > > > > > >   qapi/net.json                                |  13 +-
> > > > > > > > > > > >   hw/virtio/vhost-iova-tree.h                  |   7 +-
> > > > > > > > > > > >   hw/virtio/vhost-shadow-virtqueue.h           |  61 ++-
> > > > > > > > > > > >   include/hw/virtio/vhost-vdpa.h               |   3 +
> > > > > > > > > > > >   include/hw/virtio/vhost.h                    |   3 +
> > > > > > > > > > > >   include/hw/virtio/virtio-net.h               |   4 +
> > > > > > > > > > > >   include/hw/virtio/virtio.h                   |   1 +
> > > > > > > > > > > >   include/standard-headers/linux/vhost_types.h |  11 +-
> > > > > > > > > > > >   linux-headers/linux/vhost.h                  |  25 +-
> > > > > > > > > > > >   hw/net/vhost_net.c                           |   5 +-
> > > > > > > > > > > >   hw/net/virtio-net.c                          |  84 +++--
> > > > > > > > > > > >   hw/virtio/vhost-iova-tree.c                  |  35 +-
> > > > > > > > > > > >   hw/virtio/vhost-shadow-virtqueue.c           | 378 ++++++++++++++++---
> > > > > > > > > > > >   hw/virtio/vhost-vdpa.c                       | 206 +++++++++-
> > > > > > > > > > > >   hw/virtio/virtio.c                           |   2 +-
> > > > > > > > > > > >   net/vhost-vdpa.c                             | 294 ++++++++++++++-
> > > > > > > > > > > >   hw/virtio/trace-events                       |  10 +-
> > > > > > > > > > > >   17 files changed, 1012 insertions(+), 130 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>