[Qemu-devel] [PATCH v7 00/17] VT-d: vfio enablement and misc enhances

Peter Xu posted 17 patches 7 years, 1 month ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/1486456099-7345-1-git-send-email-peterx@redhat.com
Test checkpatch passed
Test docker passed
Test s390x passed
There is a newer version of this series
hw/i386/intel_iommu.c          | 669 +++++++++++++++++++++++++++++++----------
hw/i386/intel_iommu_internal.h |   2 +
hw/i386/trace-events           |  36 +++
hw/vfio/common.c               |  77 +++--
hw/vfio/trace-events           |   2 +-
hw/virtio/vhost.c              |   4 +-
include/exec/memory.h          |  49 ++-
include/hw/i386/intel_iommu.h  |  12 +
memory.c                       |  52 +++-
9 files changed, 710 insertions(+), 193 deletions(-)
[Qemu-devel] [PATCH v7 00/17] VT-d: vfio enablement and misc enhances
Posted by Peter Xu 7 years, 1 month ago
This is v7 of vt-d vfio enablement series.

v7:
- for the two traces patches: Change subjects. Remove vtd_err() and
  vtd_err_nonzero_rsvd() tracers, instead using standalone trace for
  each of the places. Don't remove any DPRINTF() if there is no
  replacement. [Jason]
- add r-b and a-b for Alex/David/Jason.
- in patch "intel_iommu: renaming gpa to iova where proper", convert
  one more place where I missed [Jason]
- fix the place where I should use "~0ULL" not "~0" [Jason]
- squash patch 16 into 18 [Jason]

v6:
- do unmap in all cases when replay [Jason]
- do global replay even if context entry is invalidated [Jason]
- when iommu reset, send unmap to all registered notifiers [Jason]
- use rcu read lock to protect the whole vfio_iommu_map_notify()
  [Alex, Paolo]

v5:
- fix patch 4 subject too long, and error spelling [Eric]
- add ack-by for alex in patch 1 [Alex]
- squashing patch 19/20 into patch 18 [Jason]
- fix comments in vtd_page_walk() [Jason]
- remove all error_report() [Jason]
- add comment for patch 18, mention about that enabled vhost without
  ATS as well [Jason]
- remove skipped debug thing during page walk [Jason]
- remove duplicated page walk trace [Jason]
- some tunings in vtd_address_space_unmap(), to provide correct iova
  and addr_mask. For this, I tuned this patch as well a bit:
  "memory: add section range info for IOMMU notifier"
  to loosen the range check

v4:
- convert all error_report()s into traces (in the two patches that did
  that)
- rebased to Jason's DMAR series (master + one more patch:
  "[PATCH V4 net-next] vhost_net: device IOTLB support")
- let vhost use the new api iommu_notifier_init() so it won't break
  vhost dmar [Jason]
- touch commit message of the patch:
  "intel_iommu: provide its own replay() callback"
  old replay is not a dead loop, but it will just consume lots of time
  [Jason]
- add comment for patch:
  "intel_iommu: do replay when context invalidate"
  telling why replay won't be a problem even without CM=1 [Jason]
- remove a useless comment line [Jason]
- remove dmar_enabled parameter for vtd_switch_address_space() and
  vtd_switch_address_space_all() [Mst, Jason]
- merged the vfio patches in, to support unmap of big ranges at the
  beginning ("[PATCH RFC 0/3] vfio: allow to notify unmap for very big
  region")
- using caching_mode instead of cache_mode_enabled, and "caching-mode"
  instead of "cache-mode" [Kevin]
- when receive context entry invalidation, we unmap the entire region
  first, then replay [Alex]
- fix commit message for patch:
  "intel_iommu: simplify irq region translation" [Kevin]
- handle domain/global invalidation, and notify where proper [Jason,
  Kevin]

v3:
- fix style error reported by patchew
- fix comment in domain switch patch: use "IOMMU address space" rather
  than "IOMMU region" [Kevin]
- add ack-by for Paolo in patch:
  "memory: add section range info for IOMMU notifier"
  (this is seperately collected besides this thread)
- remove 3 patches which are merged already (from Jason)
- rebase to master b6c0897

v2:
- change comment for "end" parameter in vtd_page_walk() [Tianyu]
- change comment for "a iova" to "an iova" [Yi]
- fix fault printed val for GPA address in vtd_page_walk_level (debug
  only)
- rebased to master (rather than Aviv's v6 series) and merged Aviv's
  series v6: picked patch 1 (as patch 1 in this series), dropped patch
  2, re-wrote patch 3 (as patch 17 of this series).
- picked up two more bugfix patches from Jason's DMAR series
- picked up the following patch as well:
  "[PATCH v3] intel_iommu: allow dynamic switch of IOMMU region"

This RFC series is a re-work for Aviv B.D.'s vfio enablement series
with vt-d:

  https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg01452.html

Aviv has done a great job there, and what we still lack there are
mostly the following:

(1) VFIO got duplicated IOTLB notifications due to splitted VT-d IOMMU
    memory region.

(2) VT-d still haven't provide a correct replay() mechanism (e.g.,
    when IOMMU domain switches, things will broke).

This series should have solved the above two issues.

Online repo:

  https://github.com/xzpeter/qemu/tree/vtd-vfio-enablement-v7

I would be glad to hear about any review comments for above patches.

=========
Test Done
=========

Build test passed for x86_64/arm/ppc64.

Simply tested with x86_64, assigning two PCI devices to a single VM,
boot the VM using:

bin=x86_64-softmmu/qemu-system-x86_64
$bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \
     -device intel-iommu,intremap=on,eim=off,caching-mode=on \
     -netdev user,id=net0,hostfwd=tcp::5555-:22 \
     -device virtio-net-pci,netdev=net0 \
     -device vfio-pci,host=03:00.0 \
     -device vfio-pci,host=02:00.0 \
     -trace events=".trace.vfio" \
     /var/lib/libvirt/images/vm1.qcow2

pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio
vtd_page_walk*
vtd_replay*
vtd_inv_desc*

Then, in the guest, run the following tool:

  https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind-group/vfio-bind-group.c

With parameter:

  ./vfio-bind-group 00:03.0 00:04.0

Check host side trace log, I can see pages are replayed and mapped in
00:04.0 device address space, like:

...
vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x401 lo 0x38fe1001
vtd_page_walk Page walk for ce (0x401, 0x38fe1001) iova range 0x0 - 0x8000000000
vtd_page_walk_level Page walk (base=0x38fe1000, level=3) iova range 0x0 - 0x8000000000
vtd_page_walk_level Page walk (base=0x35d31000, level=2) iova range 0x0 - 0x40000000
vtd_page_walk_level Page walk (base=0x34979000, level=1) iova range 0x0 - 0x200000
vtd_page_walk_one Page walk detected map level 0x1 iova 0x0 -> gpa 0x22dc3000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 -> gpa 0x22e25000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x2000 -> gpa 0x22e12000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 -> gpa 0x22e2d000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x4000 -> gpa 0x12a49000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 -> gpa 0x129bb000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x6000 -> gpa 0x128db000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa 0x12a80000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x8000 -> gpa 0x12a7e000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa 0x12b22000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0xa000 -> gpa 0x12b41000 mask 0xfff perm 3
...

=========
Todo List
=========

- error reporting for the assigned devices (as Tianyu has mentioned)

- per-domain address-space: A better solution in the future may be -
  we maintain one address space per IOMMU domain in the guest (so
  multiple devices can share a same address space if they are sharing
  the same IOMMU domains in the guest), rather than one address space
  per device (which is current implementation of vt-d). However that's
  a step further than this series, and let's see whether we can first
  provide a workable version of device assignment with vt-d
  protection.

- don't need to notify IOTLB (psi/gsi/global) invalidations to devices
  that with ATS enabled

- investigate when guest map page while mask contains existing mapped
  pages (e.g. map 12k-16k first, then map 0-12k)

- coalesce unmap during page walk (currently, we send it once per
  page)

- when do PSI for unmap, whether we can send one notify directly
  instead of walking over the page table?

- more to come...

Thanks,

Aviv Ben-David (1):
  intel_iommu: add "caching-mode" option

Peter Xu (16):
  vfio: trace map/unmap for notify as well
  vfio: introduce vfio_get_vaddr()
  vfio: allow to notify unmap for very large region
  intel_iommu: simplify irq region translation
  intel_iommu: renaming gpa to iova where proper
  intel_iommu: convert dbg macros to traces for inv
  intel_iommu: convert dbg macros to trace for trans
  intel_iommu: vtd_slpt_level_shift check level
  memory: add section range info for IOMMU notifier
  memory: provide IOMMU_NOTIFIER_FOREACH macro
  memory: provide iommu_replay_all()
  memory: introduce memory_region_notify_one()
  memory: add MemoryRegionIOMMUOps.replay() callback
  intel_iommu: provide its own replay() callback
  intel_iommu: allow dynamic switch of IOMMU region
  intel_iommu: enable vfio devices

 hw/i386/intel_iommu.c          | 669 +++++++++++++++++++++++++++++++----------
 hw/i386/intel_iommu_internal.h |   2 +
 hw/i386/trace-events           |  36 +++
 hw/vfio/common.c               |  77 +++--
 hw/vfio/trace-events           |   2 +-
 hw/virtio/vhost.c              |   4 +-
 include/exec/memory.h          |  49 ++-
 include/hw/i386/intel_iommu.h  |  12 +
 memory.c                       |  52 +++-
 9 files changed, 710 insertions(+), 193 deletions(-)

-- 
2.7.4


Re: [Qemu-devel] [PATCH v7 00/17] VT-d: vfio enablement and misc enhances
Posted by Alex Williamson 7 years, 1 month ago
On Tue,  7 Feb 2017 16:28:02 +0800
Peter Xu <peterx@redhat.com> wrote:

> This is v7 of vt-d vfio enablement series.
[snip]
> =========
> Test Done
> =========
> 
> Build test passed for x86_64/arm/ppc64.
> 
> Simply tested with x86_64, assigning two PCI devices to a single VM,
> boot the VM using:
> 
> bin=x86_64-softmmu/qemu-system-x86_64
> $bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \
>      -device intel-iommu,intremap=on,eim=off,caching-mode=on \
>      -netdev user,id=net0,hostfwd=tcp::5555-:22 \
>      -device virtio-net-pci,netdev=net0 \
>      -device vfio-pci,host=03:00.0 \
>      -device vfio-pci,host=02:00.0 \
>      -trace events=".trace.vfio" \
>      /var/lib/libvirt/images/vm1.qcow2
> 
> pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio
> vtd_page_walk*
> vtd_replay*
> vtd_inv_desc*
> 
> Then, in the guest, run the following tool:
> 
>   https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind-group/vfio-bind-group.c
> 
> With parameter:
> 
>   ./vfio-bind-group 00:03.0 00:04.0
> 
> Check host side trace log, I can see pages are replayed and mapped in
> 00:04.0 device address space, like:
> 
> ...
> vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x401 lo 0x38fe1001
> vtd_page_walk Page walk for ce (0x401, 0x38fe1001) iova range 0x0 - 0x8000000000
> vtd_page_walk_level Page walk (base=0x38fe1000, level=3) iova range 0x0 - 0x8000000000
> vtd_page_walk_level Page walk (base=0x35d31000, level=2) iova range 0x0 - 0x40000000
> vtd_page_walk_level Page walk (base=0x34979000, level=1) iova range 0x0 - 0x200000
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x0 -> gpa 0x22dc3000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 -> gpa 0x22e25000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x2000 -> gpa 0x22e12000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 -> gpa 0x22e2d000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x4000 -> gpa 0x12a49000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 -> gpa 0x129bb000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x6000 -> gpa 0x128db000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa 0x12a80000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x8000 -> gpa 0x12a7e000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa 0x12b22000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0xa000 -> gpa 0x12b41000 mask 0xfff perm 3
> ...

Hi Peter,

I'm trying to make use of this, with your vtd-vfio-enablement-v7 branch
(HEAD 0c1c4e738095).  I'm assigning an 82576 PF to a VM.  It works with
iommu=pt, but if I remove that option, the device does not work and
vfio_iommu_map_notify is never called.  Any suggestions?  My
commandline is below.  Thanks,

Alex

/usr/local/bin/qemu-system-x86_64 \
        -name guest=l1,debug-threads=on -S \
        -machine pc-q35-2.9,accel=kvm,usb=off,dump-guest-core=off,kernel-irqchip=split \
        -cpu host -m 10240 -realtime mlock=off -smp 4,sockets=1,cores=2,threads=2 \
        -no-user-config -nodefaults -monitor stdio -rtc base=utc,driftfix=slew \
        -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown \
        -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 \
        -boot strict=on \
        -device ioh3420,port=0x10,chassis=1,id=pci.1,bus=pcie.0,addr=0x2 \
        -device i82801b11-bridge,id=pci.2,bus=pcie.0,addr=0x1e \
        -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2,addr=0x0 \
        -device ioh3420,port=0x18,chassis=4,id=pci.4,bus=pcie.0,addr=0x3 \
        -device ioh3420,port=0x20,chassis=5,id=pci.5,bus=pcie.0,addr=0x4 \
        -device ioh3420,port=0x28,chassis=6,id=pci.6,bus=pcie.0,addr=0x5 \
        -device ioh3420,port=0x30,chassis=7,id=pci.7,bus=pcie.0,addr=0x6 \
        -device ioh3420,port=0x38,chassis=8,id=pci.8,bus=pcie.0,addr=0x7 \
        -device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x1d.0x7 \
        -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,addr=0x1d \
        -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x1d.0x1 \
        -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x1d.0x2 \
        -device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 \
        -drive file=/dev/vg_s20/lv_l1,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native \
        -device virtio-blk-pci,scsi=off,bus=pci.5,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
        -netdev user,id=hostnet0 \
        -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:c2:62:30,bus=pci.1,addr=0x0 \
        -device usb-tablet,id=input0,bus=usb.0,port=1 \
        -vnc :0 -vga std \
        -device vfio-pci,host=01:00.0,id=hostdev0,bus=pci.8,addr=0x0 \
        -device intel-iommu,intremap=on,eim=off,caching-mode=on -trace events=/trace-events.txt -msg timestamp=on

# cat /trace-events.txt 
vfio_listener*
vfio_iommu*
vtd*

Re: [Qemu-devel] [PATCH v7 00/17] VT-d: vfio enablement and misc enhances
Posted by Peter Xu 7 years, 1 month ago
On Fri, Feb 17, 2017 at 10:18:35AM -0700, Alex Williamson wrote:
> On Tue,  7 Feb 2017 16:28:02 +0800
> Peter Xu <peterx@redhat.com> wrote:
> 
> > This is v7 of vt-d vfio enablement series.
> [snip]
> > =========
> > Test Done
> > =========
> > 
> > Build test passed for x86_64/arm/ppc64.
> > 
> > Simply tested with x86_64, assigning two PCI devices to a single VM,
> > boot the VM using:
> > 
> > bin=x86_64-softmmu/qemu-system-x86_64
> > $bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \
> >      -device intel-iommu,intremap=on,eim=off,caching-mode=on \
> >      -netdev user,id=net0,hostfwd=tcp::5555-:22 \
> >      -device virtio-net-pci,netdev=net0 \
> >      -device vfio-pci,host=03:00.0 \
> >      -device vfio-pci,host=02:00.0 \
> >      -trace events=".trace.vfio" \
> >      /var/lib/libvirt/images/vm1.qcow2
> > 
> > pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio
> > vtd_page_walk*
> > vtd_replay*
> > vtd_inv_desc*
> > 
> > Then, in the guest, run the following tool:
> > 
> >   https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind-group/vfio-bind-group.c
> > 
> > With parameter:
> > 
> >   ./vfio-bind-group 00:03.0 00:04.0
> > 
> > Check host side trace log, I can see pages are replayed and mapped in
> > 00:04.0 device address space, like:
> > 
> > ...
> > vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x401 lo 0x38fe1001
> > vtd_page_walk Page walk for ce (0x401, 0x38fe1001) iova range 0x0 - 0x8000000000
> > vtd_page_walk_level Page walk (base=0x38fe1000, level=3) iova range 0x0 - 0x8000000000
> > vtd_page_walk_level Page walk (base=0x35d31000, level=2) iova range 0x0 - 0x40000000
> > vtd_page_walk_level Page walk (base=0x34979000, level=1) iova range 0x0 - 0x200000
> > vtd_page_walk_one Page walk detected map level 0x1 iova 0x0 -> gpa 0x22dc3000 mask 0xfff perm 3
> > vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 -> gpa 0x22e25000 mask 0xfff perm 3
> > vtd_page_walk_one Page walk detected map level 0x1 iova 0x2000 -> gpa 0x22e12000 mask 0xfff perm 3
> > vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 -> gpa 0x22e2d000 mask 0xfff perm 3
> > vtd_page_walk_one Page walk detected map level 0x1 iova 0x4000 -> gpa 0x12a49000 mask 0xfff perm 3
> > vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 -> gpa 0x129bb000 mask 0xfff perm 3
> > vtd_page_walk_one Page walk detected map level 0x1 iova 0x6000 -> gpa 0x128db000 mask 0xfff perm 3
> > vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa 0x12a80000 mask 0xfff perm 3
> > vtd_page_walk_one Page walk detected map level 0x1 iova 0x8000 -> gpa 0x12a7e000 mask 0xfff perm 3
> > vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa 0x12b22000 mask 0xfff perm 3
> > vtd_page_walk_one Page walk detected map level 0x1 iova 0xa000 -> gpa 0x12b41000 mask 0xfff perm 3
> > ...
> 
> Hi Peter,
> 
> I'm trying to make use of this, with your vtd-vfio-enablement-v7 branch
> (HEAD 0c1c4e738095).  I'm assigning an 82576 PF to a VM.  It works with
> iommu=pt, but if I remove that option, the device does not work and
> vfio_iommu_map_notify is never called.  Any suggestions?  My
> commandline is below.  Thanks,
> 
> Alex
> 
> /usr/local/bin/qemu-system-x86_64 \
>         -name guest=l1,debug-threads=on -S \
>         -machine pc-q35-2.9,accel=kvm,usb=off,dump-guest-core=off,kernel-irqchip=split \
>         -cpu host -m 10240 -realtime mlock=off -smp 4,sockets=1,cores=2,threads=2 \
>         -no-user-config -nodefaults -monitor stdio -rtc base=utc,driftfix=slew \
>         -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown \
>         -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 \
>         -boot strict=on \
>         -device ioh3420,port=0x10,chassis=1,id=pci.1,bus=pcie.0,addr=0x2 \
>         -device i82801b11-bridge,id=pci.2,bus=pcie.0,addr=0x1e \
>         -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2,addr=0x0 \
>         -device ioh3420,port=0x18,chassis=4,id=pci.4,bus=pcie.0,addr=0x3 \
>         -device ioh3420,port=0x20,chassis=5,id=pci.5,bus=pcie.0,addr=0x4 \
>         -device ioh3420,port=0x28,chassis=6,id=pci.6,bus=pcie.0,addr=0x5 \
>         -device ioh3420,port=0x30,chassis=7,id=pci.7,bus=pcie.0,addr=0x6 \
>         -device ioh3420,port=0x38,chassis=8,id=pci.8,bus=pcie.0,addr=0x7 \
>         -device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x1d.0x7 \
>         -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,addr=0x1d \
>         -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x1d.0x1 \
>         -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x1d.0x2 \
>         -device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 \
>         -drive file=/dev/vg_s20/lv_l1,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native \
>         -device virtio-blk-pci,scsi=off,bus=pci.5,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
>         -netdev user,id=hostnet0 \
>         -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:c2:62:30,bus=pci.1,addr=0x0 \
>         -device usb-tablet,id=input0,bus=usb.0,port=1 \
>         -vnc :0 -vga std \
>         -device vfio-pci,host=01:00.0,id=hostdev0,bus=pci.8,addr=0x0 \
>         -device intel-iommu,intremap=on,eim=off,caching-mode=on -trace events=/trace-events.txt -msg timestamp=on

Alex,

Thanks for testing this series.

I think I reproduced it using my 10g nic as well. What I got is:

[   23.724787] ixgbe 0000:01:00.0 enp1s0: Detected Tx Unit Hang
[   23.724787]   Tx Queue             <0>
[   23.724787]   TDH, TDT             <0>, <1>
[   23.724787]   next_to_use          <1>
[   23.724787]   next_to_clean        <0>
[   23.724787] tx_buffer_info[next_to_clean]
[   23.724787]   time_stamp           <fffbb8bb>
[   23.724787]   jiffies              <fffbc780>
[   23.729580] ixgbe 0000:01:00.0 enp1s0: tx hang 1 detected on queue 0, resetting adapter
[   23.730752] ixgbe 0000:01:00.0 enp1s0: initiating reset due to tx timeout
[   23.731768] ixgbe 0000:01:00.0 enp1s0: Reset adapter

Is this the problem you have encountered? (adapter continuously reset)

Interestingly, I found that the problem solves itself after I move the
"-device intel-iommu,..." line before all the other devices.

Or say, this will be the much shorter reproducer meet the bug:

$qemu   -machine q35,accel=kvm,kernel-irqchip=split \
        -cpu host -smp 4 -m 2048 \
        -nographic -nodefaults -serial stdio \
        -device vfio-pci,host=05:00.0,bus=pci.1 \
        -device intel-iommu,intremap=on,eim=off,caching-mode=on \
        /images/fedora-25.qcow2

While this may possibly be okay at least on my host (switching the
order of the two devices):

$qemu   -machine q35,accel=kvm,kernel-irqchip=split \
        -cpu host -smp 4 -m 2048 \
        -nographic -nodefaults -serial stdio \
        -device intel-iommu,intremap=on,eim=off,caching-mode=on \
        -device vfio-pci,host=05:00.0,bus=pci.1 \
        /images/fedora-25.qcow2

So not sure how the ordering of realization of these two devices
(intel-iommu, vfio-pci) affected the behavior. One thing I suspect is
that in vfio_realize(), we have:

  group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev), errp);

while here we possibly will be getting &address_space_memory here
instead of the correct DMA address space since Intel IOMMU device has
not yet been inited...

Before I go deeper, any thoughts?

Thanks,

-- peterx

Re: [Qemu-devel] [PATCH v7 00/17] VT-d: vfio enablement and misc enhances
Posted by Liu, Yi L 7 years, 1 month ago
> -----Original Message-----
> From: Qemu-devel [mailto:qemu-devel-bounces+yi.l.liu=intel.com@nongnu.org]
> On Behalf Of Peter Xu
> Sent: Monday, February 20, 2017 3:48 PM
> To: Alex Williamson <alex.williamson@redhat.com>
> Cc: Lan, Tianyu <tianyu.lan@intel.com>; Tian, Kevin <kevin.tian@intel.com>;
> mst@redhat.com; jan.kiszka@siemens.com; jasowang@redhat.com; qemu-
> devel@nongnu.org; bd.aviv@gmail.com; David Gibson
> <david@gibson.dropbear.id.au>
> Subject: Re: [Qemu-devel] [PATCH v7 00/17] VT-d: vfio enablement and misc
> enhances
> 
> On Fri, Feb 17, 2017 at 10:18:35AM -0700, Alex Williamson wrote:
> > On Tue,  7 Feb 2017 16:28:02 +0800
> > Peter Xu <peterx@redhat.com> wrote:
> >
> > > This is v7 of vt-d vfio enablement series.
> > [snip]
> > > =========
> > > Test Done
> > > =========
> > >
> > > Build test passed for x86_64/arm/ppc64.
> > >
> > > Simply tested with x86_64, assigning two PCI devices to a single VM,
> > > boot the VM using:
> > >
> > > bin=x86_64-softmmu/qemu-system-x86_64
> > > $bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \
> > >      -device intel-iommu,intremap=on,eim=off,caching-mode=on \
> > >      -netdev user,id=net0,hostfwd=tcp::5555-:22 \
> > >      -device virtio-net-pci,netdev=net0 \
> > >      -device vfio-pci,host=03:00.0 \
> > >      -device vfio-pci,host=02:00.0 \
> > >      -trace events=".trace.vfio" \
> > >      /var/lib/libvirt/images/vm1.qcow2
> > >
> > > pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio
> > > vtd_page_walk*
> > > vtd_replay*
> > > vtd_inv_desc*
> > >
> > > Then, in the guest, run the following tool:
> > >
> > >
> > > https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind
> > > -group/vfio-bind-group.c
> > >
> > > With parameter:
> > >
> > >   ./vfio-bind-group 00:03.0 00:04.0
> > >
> > > Check host side trace log, I can see pages are replayed and mapped
> > > in
> > > 00:04.0 device address space, like:
> > >
> > > ...
> > > vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x401 lo
> > > 0x38fe1001 vtd_page_walk Page walk for ce (0x401, 0x38fe1001) iova
> > > range 0x0 - 0x8000000000 vtd_page_walk_level Page walk
> > > (base=0x38fe1000, level=3) iova range 0x0 - 0x8000000000
> > > vtd_page_walk_level Page walk (base=0x35d31000, level=2) iova range
> > > 0x0 - 0x40000000 vtd_page_walk_level Page walk (base=0x34979000,
> > > level=1) iova range 0x0 - 0x200000 vtd_page_walk_one Page walk
> > > detected map level 0x1 iova 0x0 -> gpa 0x22dc3000 mask 0xfff perm 3
> > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 ->
> > > gpa 0x22e25000 mask 0xfff perm 3 vtd_page_walk_one Page walk
> > > detected map level 0x1 iova 0x2000 -> gpa 0x22e12000 mask 0xfff perm
> > > 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 ->
> > > gpa 0x22e2d000 mask 0xfff perm 3 vtd_page_walk_one Page walk
> > > detected map level 0x1 iova 0x4000 -> gpa 0x12a49000 mask 0xfff perm
> > > 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 ->
> > > gpa 0x129bb000 mask 0xfff perm 3 vtd_page_walk_one Page walk
> > > detected map level 0x1 iova 0x6000 -> gpa 0x128db000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa
> 0x12a80000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map
> level 0x1 iova 0x8000 -> gpa 0x12a7e000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa
> 0x12b22000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map
> level 0x1 iova 0xa000 -> gpa 0x12b41000 mask 0xfff perm 3 ...
> >
> > Hi Peter,
> >
> > I'm trying to make use of this, with your vtd-vfio-enablement-v7
> > branch (HEAD 0c1c4e738095).  I'm assigning an 82576 PF to a VM.  It
> > works with iommu=pt, but if I remove that option, the device does not
> > work and vfio_iommu_map_notify is never called.  Any suggestions?  My
> > commandline is below.  Thanks,
> >
> > Alex
> >
> > /usr/local/bin/qemu-system-x86_64 \
> >         -name guest=l1,debug-threads=on -S \
> >         -machine pc-q35-2.9,accel=kvm,usb=off,dump-guest-core=off,kernel-
> irqchip=split \
> >         -cpu host -m 10240 -realtime mlock=off -smp
> 4,sockets=1,cores=2,threads=2 \
> >         -no-user-config -nodefaults -monitor stdio -rtc base=utc,driftfix=slew \
> >         -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown \
> >         -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 \
> >         -boot strict=on \
> >         -device ioh3420,port=0x10,chassis=1,id=pci.1,bus=pcie.0,addr=0x2 \
> >         -device i82801b11-bridge,id=pci.2,bus=pcie.0,addr=0x1e \
> >         -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2,addr=0x0 \
> >         -device ioh3420,port=0x18,chassis=4,id=pci.4,bus=pcie.0,addr=0x3 \
> >         -device ioh3420,port=0x20,chassis=5,id=pci.5,bus=pcie.0,addr=0x4 \
> >         -device ioh3420,port=0x28,chassis=6,id=pci.6,bus=pcie.0,addr=0x5 \
> >         -device ioh3420,port=0x30,chassis=7,id=pci.7,bus=pcie.0,addr=0x6 \
> >         -device ioh3420,port=0x38,chassis=8,id=pci.8,bus=pcie.0,addr=0x7 \
> >         -device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x1d.0x7 \
> >         -device ich9-usb-
> uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,addr=0x1d \
> >         -device ich9-usb-
> uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x1d.0x1 \
> >         -device ich9-usb-
> uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x1d.0x2 \
> >         -device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 \
> >         -drive file=/dev/vg_s20/lv_l1,format=raw,if=none,id=drive-virtio-
> disk0,cache=none,aio=native \
> >         -device virtio-blk-pci,scsi=off,bus=pci.5,addr=0x0,drive=drive-virtio-
> disk0,id=virtio-disk0,bootindex=1 \
> >         -netdev user,id=hostnet0 \
> >         -device virtio-net-
> pci,netdev=hostnet0,id=net0,mac=52:54:00:c2:62:30,bus=pci.1,addr=0x0 \
> >         -device usb-tablet,id=input0,bus=usb.0,port=1 \
> >         -vnc :0 -vga std \
> >         -device vfio-pci,host=01:00.0,id=hostdev0,bus=pci.8,addr=0x0 \
> >         -device intel-iommu,intremap=on,eim=off,caching-mode=on -trace
> > events=/trace-events.txt -msg timestamp=on
> 
> Alex,
> 
> Thanks for testing this series.
> 
> I think I reproduced it using my 10g nic as well. What I got is:
> 
> [   23.724787] ixgbe 0000:01:00.0 enp1s0: Detected Tx Unit Hang
> [   23.724787]   Tx Queue             <0>
> [   23.724787]   TDH, TDT             <0>, <1>
> [   23.724787]   next_to_use          <1>
> [   23.724787]   next_to_clean        <0>
> [   23.724787] tx_buffer_info[next_to_clean]
> [   23.724787]   time_stamp           <fffbb8bb>
> [   23.724787]   jiffies              <fffbc780>
> [   23.729580] ixgbe 0000:01:00.0 enp1s0: tx hang 1 detected on queue 0,
> resetting adapter
> [   23.730752] ixgbe 0000:01:00.0 enp1s0: initiating reset due to tx timeout
> [   23.731768] ixgbe 0000:01:00.0 enp1s0: Reset adapter
> 
> Is this the problem you have encountered? (adapter continuously reset)
> 
> Interestingly, I found that the problem solves itself after I move the "-device
> intel-iommu,..." line before all the other devices.

I also encountered this interesting thing. yes, it is. you must place
"-device intel-iommu" before the vfio-pci devices. If I remember correctly, 
if "device intel-iommu" is not in front the others, the vtd_realize is called after
vfio_initfn, which would result in no calling of the following code snapshot.
Then there is no channel between vfio device and intel-iommu, so everything
is possible if such channel is gone. So better to place "intel-iommu" first place^_^

hw/vfio/common.c: vfio_listener_region_add()
    if (memory_region_is_iommu(section->mr)) {
        VFIOGuestIOMMU *giommu;

        trace_vfio_listener_region_add_iommu(iova, end);
        /*
         * FIXME: For VFIO iommu types which have KVM acceleration to
         * avoid bouncing all map/unmaps through qemu this way, this
         * would be the right place to wire that up (tell the KVM
         * device emulation the VFIO iommu handles to use).
         */
        giommu = g_malloc0(sizeof(*giommu));
        giommu->iommu = section->mr;
        giommu->iommu_offset = section->offset_within_address_space -
                               section->offset_within_region;
        giommu->container = container;
        giommu->n.notify = vfio_iommu_map_notify;
        giommu->n.notifier_flags = IOMMU_NOTIFIER_ALL;

Regards,
Yi L

> Or say, this will be the much shorter reproducer meet the bug:
> 
> $qemu   -machine q35,accel=kvm,kernel-irqchip=split \
>         -cpu host -smp 4 -m 2048 \
>         -nographic -nodefaults -serial stdio \
>         -device vfio-pci,host=05:00.0,bus=pci.1 \
>         -device intel-iommu,intremap=on,eim=off,caching-mode=on \
>         /images/fedora-25.qcow2
> 
> While this may possibly be okay at least on my host (switching the order of the
> two devices):
> 
> $qemu   -machine q35,accel=kvm,kernel-irqchip=split \
>         -cpu host -smp 4 -m 2048 \
>         -nographic -nodefaults -serial stdio \
>         -device intel-iommu,intremap=on,eim=off,caching-mode=on \
>         -device vfio-pci,host=05:00.0,bus=pci.1 \
>         /images/fedora-25.qcow2
> 
> So not sure how the ordering of realization of these two devices (intel-iommu,
> vfio-pci) affected the behavior. One thing I suspect is that in vfio_realize(), we
> have:
> 
>   group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev),
> errp);
> 
> while here we possibly will be getting &address_space_memory here instead of
> the correct DMA address space since Intel IOMMU device has not yet been
> inited...
> 
> Before I go deeper, any thoughts?
> 
> Thanks,
> 
> -- peterx

Re: [Qemu-devel] [PATCH v7 00/17] VT-d: vfio enablement and misc enhances
Posted by Peter Xu 7 years, 1 month ago
On Mon, Feb 20, 2017 at 08:17:32AM +0000, Liu, Yi L wrote:
> > -----Original Message-----
> > From: Qemu-devel [mailto:qemu-devel-bounces+yi.l.liu=intel.com@nongnu.org]
> > On Behalf Of Peter Xu
> > Sent: Monday, February 20, 2017 3:48 PM
> > To: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Lan, Tianyu <tianyu.lan@intel.com>; Tian, Kevin <kevin.tian@intel.com>;
> > mst@redhat.com; jan.kiszka@siemens.com; jasowang@redhat.com; qemu-
> > devel@nongnu.org; bd.aviv@gmail.com; David Gibson
> > <david@gibson.dropbear.id.au>
> > Subject: Re: [Qemu-devel] [PATCH v7 00/17] VT-d: vfio enablement and misc
> > enhances
> > 
> > On Fri, Feb 17, 2017 at 10:18:35AM -0700, Alex Williamson wrote:
> > > On Tue,  7 Feb 2017 16:28:02 +0800
> > > Peter Xu <peterx@redhat.com> wrote:
> > >
> > > > This is v7 of vt-d vfio enablement series.
> > > [snip]
> > > > =========
> > > > Test Done
> > > > =========
> > > >
> > > > Build test passed for x86_64/arm/ppc64.
> > > >
> > > > Simply tested with x86_64, assigning two PCI devices to a single VM,
> > > > boot the VM using:
> > > >
> > > > bin=x86_64-softmmu/qemu-system-x86_64
> > > > $bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \
> > > >      -device intel-iommu,intremap=on,eim=off,caching-mode=on \
> > > >      -netdev user,id=net0,hostfwd=tcp::5555-:22 \
> > > >      -device virtio-net-pci,netdev=net0 \
> > > >      -device vfio-pci,host=03:00.0 \
> > > >      -device vfio-pci,host=02:00.0 \
> > > >      -trace events=".trace.vfio" \
> > > >      /var/lib/libvirt/images/vm1.qcow2
> > > >
> > > > pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio
> > > > vtd_page_walk*
> > > > vtd_replay*
> > > > vtd_inv_desc*
> > > >
> > > > Then, in the guest, run the following tool:
> > > >
> > > >
> > > > https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind
> > > > -group/vfio-bind-group.c
> > > >
> > > > With parameter:
> > > >
> > > >   ./vfio-bind-group 00:03.0 00:04.0
> > > >
> > > > Check host side trace log, I can see pages are replayed and mapped
> > > > in
> > > > 00:04.0 device address space, like:
> > > >
> > > > ...
> > > > vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x401 lo
> > > > 0x38fe1001 vtd_page_walk Page walk for ce (0x401, 0x38fe1001) iova
> > > > range 0x0 - 0x8000000000 vtd_page_walk_level Page walk
> > > > (base=0x38fe1000, level=3) iova range 0x0 - 0x8000000000
> > > > vtd_page_walk_level Page walk (base=0x35d31000, level=2) iova range
> > > > 0x0 - 0x40000000 vtd_page_walk_level Page walk (base=0x34979000,
> > > > level=1) iova range 0x0 - 0x200000 vtd_page_walk_one Page walk
> > > > detected map level 0x1 iova 0x0 -> gpa 0x22dc3000 mask 0xfff perm 3
> > > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 ->
> > > > gpa 0x22e25000 mask 0xfff perm 3 vtd_page_walk_one Page walk
> > > > detected map level 0x1 iova 0x2000 -> gpa 0x22e12000 mask 0xfff perm
> > > > 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 ->
> > > > gpa 0x22e2d000 mask 0xfff perm 3 vtd_page_walk_one Page walk
> > > > detected map level 0x1 iova 0x4000 -> gpa 0x12a49000 mask 0xfff perm
> > > > 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 ->
> > > > gpa 0x129bb000 mask 0xfff perm 3 vtd_page_walk_one Page walk
> > > > detected map level 0x1 iova 0x6000 -> gpa 0x128db000 mask 0xfff perm 3
> > vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa
> > 0x12a80000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map
> > level 0x1 iova 0x8000 -> gpa 0x12a7e000 mask 0xfff perm 3
> > vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa
> > 0x12b22000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map
> > level 0x1 iova 0xa000 -> gpa 0x12b41000 mask 0xfff perm 3 ...
> > >
> > > Hi Peter,
> > >
> > > I'm trying to make use of this, with your vtd-vfio-enablement-v7
> > > branch (HEAD 0c1c4e738095).  I'm assigning an 82576 PF to a VM.  It
> > > works with iommu=pt, but if I remove that option, the device does not
> > > work and vfio_iommu_map_notify is never called.  Any suggestions?  My
> > > commandline is below.  Thanks,
> > >
> > > Alex
> > >
> > > /usr/local/bin/qemu-system-x86_64 \
> > >         -name guest=l1,debug-threads=on -S \
> > >         -machine pc-q35-2.9,accel=kvm,usb=off,dump-guest-core=off,kernel-
> > irqchip=split \
> > >         -cpu host -m 10240 -realtime mlock=off -smp
> > 4,sockets=1,cores=2,threads=2 \
> > >         -no-user-config -nodefaults -monitor stdio -rtc base=utc,driftfix=slew \
> > >         -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown \
> > >         -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 \
> > >         -boot strict=on \
> > >         -device ioh3420,port=0x10,chassis=1,id=pci.1,bus=pcie.0,addr=0x2 \
> > >         -device i82801b11-bridge,id=pci.2,bus=pcie.0,addr=0x1e \
> > >         -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2,addr=0x0 \
> > >         -device ioh3420,port=0x18,chassis=4,id=pci.4,bus=pcie.0,addr=0x3 \
> > >         -device ioh3420,port=0x20,chassis=5,id=pci.5,bus=pcie.0,addr=0x4 \
> > >         -device ioh3420,port=0x28,chassis=6,id=pci.6,bus=pcie.0,addr=0x5 \
> > >         -device ioh3420,port=0x30,chassis=7,id=pci.7,bus=pcie.0,addr=0x6 \
> > >         -device ioh3420,port=0x38,chassis=8,id=pci.8,bus=pcie.0,addr=0x7 \
> > >         -device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x1d.0x7 \
> > >         -device ich9-usb-
> > uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,addr=0x1d \
> > >         -device ich9-usb-
> > uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x1d.0x1 \
> > >         -device ich9-usb-
> > uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x1d.0x2 \
> > >         -device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 \
> > >         -drive file=/dev/vg_s20/lv_l1,format=raw,if=none,id=drive-virtio-
> > disk0,cache=none,aio=native \
> > >         -device virtio-blk-pci,scsi=off,bus=pci.5,addr=0x0,drive=drive-virtio-
> > disk0,id=virtio-disk0,bootindex=1 \
> > >         -netdev user,id=hostnet0 \
> > >         -device virtio-net-
> > pci,netdev=hostnet0,id=net0,mac=52:54:00:c2:62:30,bus=pci.1,addr=0x0 \
> > >         -device usb-tablet,id=input0,bus=usb.0,port=1 \
> > >         -vnc :0 -vga std \
> > >         -device vfio-pci,host=01:00.0,id=hostdev0,bus=pci.8,addr=0x0 \
> > >         -device intel-iommu,intremap=on,eim=off,caching-mode=on -trace
> > > events=/trace-events.txt -msg timestamp=on
> > 
> > Alex,
> > 
> > Thanks for testing this series.
> > 
> > I think I reproduced it using my 10g nic as well. What I got is:
> > 
> > [   23.724787] ixgbe 0000:01:00.0 enp1s0: Detected Tx Unit Hang
> > [   23.724787]   Tx Queue             <0>
> > [   23.724787]   TDH, TDT             <0>, <1>
> > [   23.724787]   next_to_use          <1>
> > [   23.724787]   next_to_clean        <0>
> > [   23.724787] tx_buffer_info[next_to_clean]
> > [   23.724787]   time_stamp           <fffbb8bb>
> > [   23.724787]   jiffies              <fffbc780>
> > [   23.729580] ixgbe 0000:01:00.0 enp1s0: tx hang 1 detected on queue 0,
> > resetting adapter
> > [   23.730752] ixgbe 0000:01:00.0 enp1s0: initiating reset due to tx timeout
> > [   23.731768] ixgbe 0000:01:00.0 enp1s0: Reset adapter
> > 
> > Is this the problem you have encountered? (adapter continuously reset)
> > 
> > Interestingly, I found that the problem solves itself after I move the "-device
> > intel-iommu,..." line before all the other devices.
> 
> I also encountered this interesting thing. yes, it is. you must place
> "-device intel-iommu" before the vfio-pci devices. If I remember correctly, 
> if "device intel-iommu" is not in front the others, the vtd_realize is called after
> vfio_initfn, which would result in no calling of the following code snapshot.
> Then there is no channel between vfio device and intel-iommu, so everything
> is possible if such channel is gone. So better to place "intel-iommu" first place^_^
> 
> hw/vfio/common.c: vfio_listener_region_add()
>     if (memory_region_is_iommu(section->mr)) {
>         VFIOGuestIOMMU *giommu;
> 
>         trace_vfio_listener_region_add_iommu(iova, end);
>         /*
>          * FIXME: For VFIO iommu types which have KVM acceleration to
>          * avoid bouncing all map/unmaps through qemu this way, this
>          * would be the right place to wire that up (tell the KVM
>          * device emulation the VFIO iommu handles to use).
>          */
>         giommu = g_malloc0(sizeof(*giommu));
>         giommu->iommu = section->mr;
>         giommu->iommu_offset = section->offset_within_address_space -
>                                section->offset_within_region;
>         giommu->container = container;
>         giommu->n.notify = vfio_iommu_map_notify;
>         giommu->n.notifier_flags = IOMMU_NOTIFIER_ALL;

Yeah. I think that's possibly because when we do "-device vfio-pci"
first then "-device intel-iommu" then we are actually listening to the
&address_space_memory and any real update on the IOMMU address space
is lost.

Imho forcing user to add "-device intel-iommu" first might be a little
bit "tough" indeed. Not sure whether we should just provide (or do we
have it?) a way to decide the init order of device list.

Thanks,

-- peterx

Re: [Qemu-devel] [PATCH v7 00/17] VT-d: vfio enablement and misc enhances
Posted by Alex Williamson 7 years, 1 month ago
On Mon, 20 Feb 2017 15:47:31 +0800
Peter Xu <peterx@redhat.com> wrote:

> On Fri, Feb 17, 2017 at 10:18:35AM -0700, Alex Williamson wrote:
> > On Tue,  7 Feb 2017 16:28:02 +0800
> > Peter Xu <peterx@redhat.com> wrote:
> >   
> > > This is v7 of vt-d vfio enablement series.  
> > [snip]  
> > > =========
> > > Test Done
> > > =========
> > > 
> > > Build test passed for x86_64/arm/ppc64.
> > > 
> > > Simply tested with x86_64, assigning two PCI devices to a single VM,
> > > boot the VM using:
> > > 
> > > bin=x86_64-softmmu/qemu-system-x86_64
> > > $bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \
> > >      -device intel-iommu,intremap=on,eim=off,caching-mode=on \
> > >      -netdev user,id=net0,hostfwd=tcp::5555-:22 \
> > >      -device virtio-net-pci,netdev=net0 \
> > >      -device vfio-pci,host=03:00.0 \
> > >      -device vfio-pci,host=02:00.0 \
> > >      -trace events=".trace.vfio" \
> > >      /var/lib/libvirt/images/vm1.qcow2
> > > 
> > > pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio
> > > vtd_page_walk*
> > > vtd_replay*
> > > vtd_inv_desc*
> > > 
> > > Then, in the guest, run the following tool:
> > > 
> > >   https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind-group/vfio-bind-group.c
> > > 
> > > With parameter:
> > > 
> > >   ./vfio-bind-group 00:03.0 00:04.0
> > > 
> > > Check host side trace log, I can see pages are replayed and mapped in
> > > 00:04.0 device address space, like:
> > > 
> > > ...
> > > vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x401 lo 0x38fe1001
> > > vtd_page_walk Page walk for ce (0x401, 0x38fe1001) iova range 0x0 - 0x8000000000
> > > vtd_page_walk_level Page walk (base=0x38fe1000, level=3) iova range 0x0 - 0x8000000000
> > > vtd_page_walk_level Page walk (base=0x35d31000, level=2) iova range 0x0 - 0x40000000
> > > vtd_page_walk_level Page walk (base=0x34979000, level=1) iova range 0x0 - 0x200000
> > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x0 -> gpa 0x22dc3000 mask 0xfff perm 3
> > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 -> gpa 0x22e25000 mask 0xfff perm 3
> > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x2000 -> gpa 0x22e12000 mask 0xfff perm 3
> > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 -> gpa 0x22e2d000 mask 0xfff perm 3
> > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x4000 -> gpa 0x12a49000 mask 0xfff perm 3
> > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 -> gpa 0x129bb000 mask 0xfff perm 3
> > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x6000 -> gpa 0x128db000 mask 0xfff perm 3
> > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa 0x12a80000 mask 0xfff perm 3
> > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x8000 -> gpa 0x12a7e000 mask 0xfff perm 3
> > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa 0x12b22000 mask 0xfff perm 3
> > > vtd_page_walk_one Page walk detected map level 0x1 iova 0xa000 -> gpa 0x12b41000 mask 0xfff perm 3
> > > ...  
> > 
> > Hi Peter,
> > 
> > I'm trying to make use of this, with your vtd-vfio-enablement-v7 branch
> > (HEAD 0c1c4e738095).  I'm assigning an 82576 PF to a VM.  It works with
> > iommu=pt, but if I remove that option, the device does not work and
> > vfio_iommu_map_notify is never called.  Any suggestions?  My
> > commandline is below.  Thanks,
> > 
> > Alex
> > 
> > /usr/local/bin/qemu-system-x86_64 \
> >         -name guest=l1,debug-threads=on -S \
> >         -machine pc-q35-2.9,accel=kvm,usb=off,dump-guest-core=off,kernel-irqchip=split \
> >         -cpu host -m 10240 -realtime mlock=off -smp 4,sockets=1,cores=2,threads=2 \
> >         -no-user-config -nodefaults -monitor stdio -rtc base=utc,driftfix=slew \
> >         -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown \
> >         -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 \
> >         -boot strict=on \
> >         -device ioh3420,port=0x10,chassis=1,id=pci.1,bus=pcie.0,addr=0x2 \
> >         -device i82801b11-bridge,id=pci.2,bus=pcie.0,addr=0x1e \
> >         -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2,addr=0x0 \
> >         -device ioh3420,port=0x18,chassis=4,id=pci.4,bus=pcie.0,addr=0x3 \
> >         -device ioh3420,port=0x20,chassis=5,id=pci.5,bus=pcie.0,addr=0x4 \
> >         -device ioh3420,port=0x28,chassis=6,id=pci.6,bus=pcie.0,addr=0x5 \
> >         -device ioh3420,port=0x30,chassis=7,id=pci.7,bus=pcie.0,addr=0x6 \
> >         -device ioh3420,port=0x38,chassis=8,id=pci.8,bus=pcie.0,addr=0x7 \
> >         -device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x1d.0x7 \
> >         -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,addr=0x1d \
> >         -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x1d.0x1 \
> >         -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x1d.0x2 \
> >         -device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 \
> >         -drive file=/dev/vg_s20/lv_l1,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native \
> >         -device virtio-blk-pci,scsi=off,bus=pci.5,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
> >         -netdev user,id=hostnet0 \
> >         -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:c2:62:30,bus=pci.1,addr=0x0 \
> >         -device usb-tablet,id=input0,bus=usb.0,port=1 \
> >         -vnc :0 -vga std \
> >         -device vfio-pci,host=01:00.0,id=hostdev0,bus=pci.8,addr=0x0 \
> >         -device intel-iommu,intremap=on,eim=off,caching-mode=on -trace events=/trace-events.txt -msg timestamp=on  
> 
> Alex,
> 
> Thanks for testing this series.
> 
> I think I reproduced it using my 10g nic as well. What I got is:
> 
> [   23.724787] ixgbe 0000:01:00.0 enp1s0: Detected Tx Unit Hang
> [   23.724787]   Tx Queue             <0>
> [   23.724787]   TDH, TDT             <0>, <1>
> [   23.724787]   next_to_use          <1>
> [   23.724787]   next_to_clean        <0>
> [   23.724787] tx_buffer_info[next_to_clean]
> [   23.724787]   time_stamp           <fffbb8bb>
> [   23.724787]   jiffies              <fffbc780>
> [   23.729580] ixgbe 0000:01:00.0 enp1s0: tx hang 1 detected on queue 0, resetting adapter
> [   23.730752] ixgbe 0000:01:00.0 enp1s0: initiating reset due to tx timeout
> [   23.731768] ixgbe 0000:01:00.0 enp1s0: Reset adapter
> 
> Is this the problem you have encountered? (adapter continuously reset)
> 
> Interestingly, I found that the problem solves itself after I move the
> "-device intel-iommu,..." line before all the other devices.
> 
> Or say, this will be the much shorter reproducer meet the bug:
> 
> $qemu   -machine q35,accel=kvm,kernel-irqchip=split \
>         -cpu host -smp 4 -m 2048 \
>         -nographic -nodefaults -serial stdio \
>         -device vfio-pci,host=05:00.0,bus=pci.1 \
>         -device intel-iommu,intremap=on,eim=off,caching-mode=on \
>         /images/fedora-25.qcow2
> 
> While this may possibly be okay at least on my host (switching the
> order of the two devices):
> 
> $qemu   -machine q35,accel=kvm,kernel-irqchip=split \
>         -cpu host -smp 4 -m 2048 \
>         -nographic -nodefaults -serial stdio \
>         -device intel-iommu,intremap=on,eim=off,caching-mode=on \
>         -device vfio-pci,host=05:00.0,bus=pci.1 \
>         /images/fedora-25.qcow2
> 
> So not sure how the ordering of realization of these two devices
> (intel-iommu, vfio-pci) affected the behavior. One thing I suspect is
> that in vfio_realize(), we have:
> 
>   group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev), errp);
> 
> while here we possibly will be getting &address_space_memory here
> instead of the correct DMA address space since Intel IOMMU device has
> not yet been inited...
> 
> Before I go deeper, any thoughts?


Sounds theory, seems confirmed by Yi.  Makes it pretty impossible to
test using libvirt <qemu:arg> support, which is how I derived my VM
commandline.  Thanks,

Alex

Re: [Qemu-devel] [PATCH v7 00/17] VT-d: vfio enablement and misc enhances
Posted by Peter Xu 7 years ago
On Tue, Feb 07, 2017 at 04:28:02PM +0800, Peter Xu wrote:
> This is v7 of vt-d vfio enablement series.
> 
> v7:
> - for the two traces patches: Change subjects. Remove vtd_err() and
>   vtd_err_nonzero_rsvd() tracers, instead using standalone trace for
>   each of the places. Don't remove any DPRINTF() if there is no
>   replacement. [Jason]
> - add r-b and a-b for Alex/David/Jason.
> - in patch "intel_iommu: renaming gpa to iova where proper", convert
>   one more place where I missed [Jason]
> - fix the place where I should use "~0ULL" not "~0" [Jason]
> - squash patch 16 into 18 [Jason]

Hi, Michael,

Do you have plan to have patch 11-17 as well in 2.9? Just a kind
reminder in case you have it since it's reaching soft freeze. Thanks,

-- peterx