hw/i386/intel_iommu.c | 669 +++++++++++++++++++++++++++++++---------- hw/i386/intel_iommu_internal.h | 2 + hw/i386/trace-events | 36 +++ hw/vfio/common.c | 77 +++-- hw/vfio/trace-events | 2 +- hw/virtio/vhost.c | 4 +- include/exec/memory.h | 49 ++- include/hw/i386/intel_iommu.h | 12 + memory.c | 52 +++- 9 files changed, 710 insertions(+), 193 deletions(-)
This is v7 of vt-d vfio enablement series. v7: - for the two traces patches: Change subjects. Remove vtd_err() and vtd_err_nonzero_rsvd() tracers, instead using standalone trace for each of the places. Don't remove any DPRINTF() if there is no replacement. [Jason] - add r-b and a-b for Alex/David/Jason. - in patch "intel_iommu: renaming gpa to iova where proper", convert one more place where I missed [Jason] - fix the place where I should use "~0ULL" not "~0" [Jason] - squash patch 16 into 18 [Jason] v6: - do unmap in all cases when replay [Jason] - do global replay even if context entry is invalidated [Jason] - when iommu reset, send unmap to all registered notifiers [Jason] - use rcu read lock to protect the whole vfio_iommu_map_notify() [Alex, Paolo] v5: - fix patch 4 subject too long, and error spelling [Eric] - add ack-by for alex in patch 1 [Alex] - squashing patch 19/20 into patch 18 [Jason] - fix comments in vtd_page_walk() [Jason] - remove all error_report() [Jason] - add comment for patch 18, mention about that enabled vhost without ATS as well [Jason] - remove skipped debug thing during page walk [Jason] - remove duplicated page walk trace [Jason] - some tunings in vtd_address_space_unmap(), to provide correct iova and addr_mask. For this, I tuned this patch as well a bit: "memory: add section range info for IOMMU notifier" to loosen the range check v4: - convert all error_report()s into traces (in the two patches that did that) - rebased to Jason's DMAR series (master + one more patch: "[PATCH V4 net-next] vhost_net: device IOTLB support") - let vhost use the new api iommu_notifier_init() so it won't break vhost dmar [Jason] - touch commit message of the patch: "intel_iommu: provide its own replay() callback" old replay is not a dead loop, but it will just consume lots of time [Jason] - add comment for patch: "intel_iommu: do replay when context invalidate" telling why replay won't be a problem even without CM=1 [Jason] - remove a useless comment line [Jason] - remove dmar_enabled parameter for vtd_switch_address_space() and vtd_switch_address_space_all() [Mst, Jason] - merged the vfio patches in, to support unmap of big ranges at the beginning ("[PATCH RFC 0/3] vfio: allow to notify unmap for very big region") - using caching_mode instead of cache_mode_enabled, and "caching-mode" instead of "cache-mode" [Kevin] - when receive context entry invalidation, we unmap the entire region first, then replay [Alex] - fix commit message for patch: "intel_iommu: simplify irq region translation" [Kevin] - handle domain/global invalidation, and notify where proper [Jason, Kevin] v3: - fix style error reported by patchew - fix comment in domain switch patch: use "IOMMU address space" rather than "IOMMU region" [Kevin] - add ack-by for Paolo in patch: "memory: add section range info for IOMMU notifier" (this is seperately collected besides this thread) - remove 3 patches which are merged already (from Jason) - rebase to master b6c0897 v2: - change comment for "end" parameter in vtd_page_walk() [Tianyu] - change comment for "a iova" to "an iova" [Yi] - fix fault printed val for GPA address in vtd_page_walk_level (debug only) - rebased to master (rather than Aviv's v6 series) and merged Aviv's series v6: picked patch 1 (as patch 1 in this series), dropped patch 2, re-wrote patch 3 (as patch 17 of this series). - picked up two more bugfix patches from Jason's DMAR series - picked up the following patch as well: "[PATCH v3] intel_iommu: allow dynamic switch of IOMMU region" This RFC series is a re-work for Aviv B.D.'s vfio enablement series with vt-d: https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg01452.html Aviv has done a great job there, and what we still lack there are mostly the following: (1) VFIO got duplicated IOTLB notifications due to splitted VT-d IOMMU memory region. (2) VT-d still haven't provide a correct replay() mechanism (e.g., when IOMMU domain switches, things will broke). This series should have solved the above two issues. Online repo: https://github.com/xzpeter/qemu/tree/vtd-vfio-enablement-v7 I would be glad to hear about any review comments for above patches. ========= Test Done ========= Build test passed for x86_64/arm/ppc64. Simply tested with x86_64, assigning two PCI devices to a single VM, boot the VM using: bin=x86_64-softmmu/qemu-system-x86_64 $bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \ -device intel-iommu,intremap=on,eim=off,caching-mode=on \ -netdev user,id=net0,hostfwd=tcp::5555-:22 \ -device virtio-net-pci,netdev=net0 \ -device vfio-pci,host=03:00.0 \ -device vfio-pci,host=02:00.0 \ -trace events=".trace.vfio" \ /var/lib/libvirt/images/vm1.qcow2 pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio vtd_page_walk* vtd_replay* vtd_inv_desc* Then, in the guest, run the following tool: https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind-group/vfio-bind-group.c With parameter: ./vfio-bind-group 00:03.0 00:04.0 Check host side trace log, I can see pages are replayed and mapped in 00:04.0 device address space, like: ... vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x401 lo 0x38fe1001 vtd_page_walk Page walk for ce (0x401, 0x38fe1001) iova range 0x0 - 0x8000000000 vtd_page_walk_level Page walk (base=0x38fe1000, level=3) iova range 0x0 - 0x8000000000 vtd_page_walk_level Page walk (base=0x35d31000, level=2) iova range 0x0 - 0x40000000 vtd_page_walk_level Page walk (base=0x34979000, level=1) iova range 0x0 - 0x200000 vtd_page_walk_one Page walk detected map level 0x1 iova 0x0 -> gpa 0x22dc3000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 -> gpa 0x22e25000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x2000 -> gpa 0x22e12000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 -> gpa 0x22e2d000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x4000 -> gpa 0x12a49000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 -> gpa 0x129bb000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x6000 -> gpa 0x128db000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa 0x12a80000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x8000 -> gpa 0x12a7e000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa 0x12b22000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0xa000 -> gpa 0x12b41000 mask 0xfff perm 3 ... ========= Todo List ========= - error reporting for the assigned devices (as Tianyu has mentioned) - per-domain address-space: A better solution in the future may be - we maintain one address space per IOMMU domain in the guest (so multiple devices can share a same address space if they are sharing the same IOMMU domains in the guest), rather than one address space per device (which is current implementation of vt-d). However that's a step further than this series, and let's see whether we can first provide a workable version of device assignment with vt-d protection. - don't need to notify IOTLB (psi/gsi/global) invalidations to devices that with ATS enabled - investigate when guest map page while mask contains existing mapped pages (e.g. map 12k-16k first, then map 0-12k) - coalesce unmap during page walk (currently, we send it once per page) - when do PSI for unmap, whether we can send one notify directly instead of walking over the page table? - more to come... Thanks, Aviv Ben-David (1): intel_iommu: add "caching-mode" option Peter Xu (16): vfio: trace map/unmap for notify as well vfio: introduce vfio_get_vaddr() vfio: allow to notify unmap for very large region intel_iommu: simplify irq region translation intel_iommu: renaming gpa to iova where proper intel_iommu: convert dbg macros to traces for inv intel_iommu: convert dbg macros to trace for trans intel_iommu: vtd_slpt_level_shift check level memory: add section range info for IOMMU notifier memory: provide IOMMU_NOTIFIER_FOREACH macro memory: provide iommu_replay_all() memory: introduce memory_region_notify_one() memory: add MemoryRegionIOMMUOps.replay() callback intel_iommu: provide its own replay() callback intel_iommu: allow dynamic switch of IOMMU region intel_iommu: enable vfio devices hw/i386/intel_iommu.c | 669 +++++++++++++++++++++++++++++++---------- hw/i386/intel_iommu_internal.h | 2 + hw/i386/trace-events | 36 +++ hw/vfio/common.c | 77 +++-- hw/vfio/trace-events | 2 +- hw/virtio/vhost.c | 4 +- include/exec/memory.h | 49 ++- include/hw/i386/intel_iommu.h | 12 + memory.c | 52 +++- 9 files changed, 710 insertions(+), 193 deletions(-) -- 2.7.4
On Tue, 7 Feb 2017 16:28:02 +0800 Peter Xu <peterx@redhat.com> wrote: > This is v7 of vt-d vfio enablement series. [snip] > ========= > Test Done > ========= > > Build test passed for x86_64/arm/ppc64. > > Simply tested with x86_64, assigning two PCI devices to a single VM, > boot the VM using: > > bin=x86_64-softmmu/qemu-system-x86_64 > $bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \ > -device intel-iommu,intremap=on,eim=off,caching-mode=on \ > -netdev user,id=net0,hostfwd=tcp::5555-:22 \ > -device virtio-net-pci,netdev=net0 \ > -device vfio-pci,host=03:00.0 \ > -device vfio-pci,host=02:00.0 \ > -trace events=".trace.vfio" \ > /var/lib/libvirt/images/vm1.qcow2 > > pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio > vtd_page_walk* > vtd_replay* > vtd_inv_desc* > > Then, in the guest, run the following tool: > > https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind-group/vfio-bind-group.c > > With parameter: > > ./vfio-bind-group 00:03.0 00:04.0 > > Check host side trace log, I can see pages are replayed and mapped in > 00:04.0 device address space, like: > > ... > vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x401 lo 0x38fe1001 > vtd_page_walk Page walk for ce (0x401, 0x38fe1001) iova range 0x0 - 0x8000000000 > vtd_page_walk_level Page walk (base=0x38fe1000, level=3) iova range 0x0 - 0x8000000000 > vtd_page_walk_level Page walk (base=0x35d31000, level=2) iova range 0x0 - 0x40000000 > vtd_page_walk_level Page walk (base=0x34979000, level=1) iova range 0x0 - 0x200000 > vtd_page_walk_one Page walk detected map level 0x1 iova 0x0 -> gpa 0x22dc3000 mask 0xfff perm 3 > vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 -> gpa 0x22e25000 mask 0xfff perm 3 > vtd_page_walk_one Page walk detected map level 0x1 iova 0x2000 -> gpa 0x22e12000 mask 0xfff perm 3 > vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 -> gpa 0x22e2d000 mask 0xfff perm 3 > vtd_page_walk_one Page walk detected map level 0x1 iova 0x4000 -> gpa 0x12a49000 mask 0xfff perm 3 > vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 -> gpa 0x129bb000 mask 0xfff perm 3 > vtd_page_walk_one Page walk detected map level 0x1 iova 0x6000 -> gpa 0x128db000 mask 0xfff perm 3 > vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa 0x12a80000 mask 0xfff perm 3 > vtd_page_walk_one Page walk detected map level 0x1 iova 0x8000 -> gpa 0x12a7e000 mask 0xfff perm 3 > vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa 0x12b22000 mask 0xfff perm 3 > vtd_page_walk_one Page walk detected map level 0x1 iova 0xa000 -> gpa 0x12b41000 mask 0xfff perm 3 > ... Hi Peter, I'm trying to make use of this, with your vtd-vfio-enablement-v7 branch (HEAD 0c1c4e738095). I'm assigning an 82576 PF to a VM. It works with iommu=pt, but if I remove that option, the device does not work and vfio_iommu_map_notify is never called. Any suggestions? My commandline is below. Thanks, Alex /usr/local/bin/qemu-system-x86_64 \ -name guest=l1,debug-threads=on -S \ -machine pc-q35-2.9,accel=kvm,usb=off,dump-guest-core=off,kernel-irqchip=split \ -cpu host -m 10240 -realtime mlock=off -smp 4,sockets=1,cores=2,threads=2 \ -no-user-config -nodefaults -monitor stdio -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown \ -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 \ -boot strict=on \ -device ioh3420,port=0x10,chassis=1,id=pci.1,bus=pcie.0,addr=0x2 \ -device i82801b11-bridge,id=pci.2,bus=pcie.0,addr=0x1e \ -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2,addr=0x0 \ -device ioh3420,port=0x18,chassis=4,id=pci.4,bus=pcie.0,addr=0x3 \ -device ioh3420,port=0x20,chassis=5,id=pci.5,bus=pcie.0,addr=0x4 \ -device ioh3420,port=0x28,chassis=6,id=pci.6,bus=pcie.0,addr=0x5 \ -device ioh3420,port=0x30,chassis=7,id=pci.7,bus=pcie.0,addr=0x6 \ -device ioh3420,port=0x38,chassis=8,id=pci.8,bus=pcie.0,addr=0x7 \ -device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x1d.0x7 \ -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,addr=0x1d \ -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x1d.0x1 \ -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x1d.0x2 \ -device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 \ -drive file=/dev/vg_s20/lv_l1,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native \ -device virtio-blk-pci,scsi=off,bus=pci.5,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \ -netdev user,id=hostnet0 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:c2:62:30,bus=pci.1,addr=0x0 \ -device usb-tablet,id=input0,bus=usb.0,port=1 \ -vnc :0 -vga std \ -device vfio-pci,host=01:00.0,id=hostdev0,bus=pci.8,addr=0x0 \ -device intel-iommu,intremap=on,eim=off,caching-mode=on -trace events=/trace-events.txt -msg timestamp=on # cat /trace-events.txt vfio_listener* vfio_iommu* vtd*
On Fri, Feb 17, 2017 at 10:18:35AM -0700, Alex Williamson wrote: > On Tue, 7 Feb 2017 16:28:02 +0800 > Peter Xu <peterx@redhat.com> wrote: > > > This is v7 of vt-d vfio enablement series. > [snip] > > ========= > > Test Done > > ========= > > > > Build test passed for x86_64/arm/ppc64. > > > > Simply tested with x86_64, assigning two PCI devices to a single VM, > > boot the VM using: > > > > bin=x86_64-softmmu/qemu-system-x86_64 > > $bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \ > > -device intel-iommu,intremap=on,eim=off,caching-mode=on \ > > -netdev user,id=net0,hostfwd=tcp::5555-:22 \ > > -device virtio-net-pci,netdev=net0 \ > > -device vfio-pci,host=03:00.0 \ > > -device vfio-pci,host=02:00.0 \ > > -trace events=".trace.vfio" \ > > /var/lib/libvirt/images/vm1.qcow2 > > > > pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio > > vtd_page_walk* > > vtd_replay* > > vtd_inv_desc* > > > > Then, in the guest, run the following tool: > > > > https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind-group/vfio-bind-group.c > > > > With parameter: > > > > ./vfio-bind-group 00:03.0 00:04.0 > > > > Check host side trace log, I can see pages are replayed and mapped in > > 00:04.0 device address space, like: > > > > ... > > vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x401 lo 0x38fe1001 > > vtd_page_walk Page walk for ce (0x401, 0x38fe1001) iova range 0x0 - 0x8000000000 > > vtd_page_walk_level Page walk (base=0x38fe1000, level=3) iova range 0x0 - 0x8000000000 > > vtd_page_walk_level Page walk (base=0x35d31000, level=2) iova range 0x0 - 0x40000000 > > vtd_page_walk_level Page walk (base=0x34979000, level=1) iova range 0x0 - 0x200000 > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x0 -> gpa 0x22dc3000 mask 0xfff perm 3 > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 -> gpa 0x22e25000 mask 0xfff perm 3 > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x2000 -> gpa 0x22e12000 mask 0xfff perm 3 > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 -> gpa 0x22e2d000 mask 0xfff perm 3 > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x4000 -> gpa 0x12a49000 mask 0xfff perm 3 > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 -> gpa 0x129bb000 mask 0xfff perm 3 > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x6000 -> gpa 0x128db000 mask 0xfff perm 3 > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa 0x12a80000 mask 0xfff perm 3 > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x8000 -> gpa 0x12a7e000 mask 0xfff perm 3 > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa 0x12b22000 mask 0xfff perm 3 > > vtd_page_walk_one Page walk detected map level 0x1 iova 0xa000 -> gpa 0x12b41000 mask 0xfff perm 3 > > ... > > Hi Peter, > > I'm trying to make use of this, with your vtd-vfio-enablement-v7 branch > (HEAD 0c1c4e738095). I'm assigning an 82576 PF to a VM. It works with > iommu=pt, but if I remove that option, the device does not work and > vfio_iommu_map_notify is never called. Any suggestions? My > commandline is below. Thanks, > > Alex > > /usr/local/bin/qemu-system-x86_64 \ > -name guest=l1,debug-threads=on -S \ > -machine pc-q35-2.9,accel=kvm,usb=off,dump-guest-core=off,kernel-irqchip=split \ > -cpu host -m 10240 -realtime mlock=off -smp 4,sockets=1,cores=2,threads=2 \ > -no-user-config -nodefaults -monitor stdio -rtc base=utc,driftfix=slew \ > -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown \ > -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 \ > -boot strict=on \ > -device ioh3420,port=0x10,chassis=1,id=pci.1,bus=pcie.0,addr=0x2 \ > -device i82801b11-bridge,id=pci.2,bus=pcie.0,addr=0x1e \ > -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2,addr=0x0 \ > -device ioh3420,port=0x18,chassis=4,id=pci.4,bus=pcie.0,addr=0x3 \ > -device ioh3420,port=0x20,chassis=5,id=pci.5,bus=pcie.0,addr=0x4 \ > -device ioh3420,port=0x28,chassis=6,id=pci.6,bus=pcie.0,addr=0x5 \ > -device ioh3420,port=0x30,chassis=7,id=pci.7,bus=pcie.0,addr=0x6 \ > -device ioh3420,port=0x38,chassis=8,id=pci.8,bus=pcie.0,addr=0x7 \ > -device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x1d.0x7 \ > -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,addr=0x1d \ > -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x1d.0x1 \ > -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x1d.0x2 \ > -device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 \ > -drive file=/dev/vg_s20/lv_l1,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native \ > -device virtio-blk-pci,scsi=off,bus=pci.5,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \ > -netdev user,id=hostnet0 \ > -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:c2:62:30,bus=pci.1,addr=0x0 \ > -device usb-tablet,id=input0,bus=usb.0,port=1 \ > -vnc :0 -vga std \ > -device vfio-pci,host=01:00.0,id=hostdev0,bus=pci.8,addr=0x0 \ > -device intel-iommu,intremap=on,eim=off,caching-mode=on -trace events=/trace-events.txt -msg timestamp=on Alex, Thanks for testing this series. I think I reproduced it using my 10g nic as well. What I got is: [ 23.724787] ixgbe 0000:01:00.0 enp1s0: Detected Tx Unit Hang [ 23.724787] Tx Queue <0> [ 23.724787] TDH, TDT <0>, <1> [ 23.724787] next_to_use <1> [ 23.724787] next_to_clean <0> [ 23.724787] tx_buffer_info[next_to_clean] [ 23.724787] time_stamp <fffbb8bb> [ 23.724787] jiffies <fffbc780> [ 23.729580] ixgbe 0000:01:00.0 enp1s0: tx hang 1 detected on queue 0, resetting adapter [ 23.730752] ixgbe 0000:01:00.0 enp1s0: initiating reset due to tx timeout [ 23.731768] ixgbe 0000:01:00.0 enp1s0: Reset adapter Is this the problem you have encountered? (adapter continuously reset) Interestingly, I found that the problem solves itself after I move the "-device intel-iommu,..." line before all the other devices. Or say, this will be the much shorter reproducer meet the bug: $qemu -machine q35,accel=kvm,kernel-irqchip=split \ -cpu host -smp 4 -m 2048 \ -nographic -nodefaults -serial stdio \ -device vfio-pci,host=05:00.0,bus=pci.1 \ -device intel-iommu,intremap=on,eim=off,caching-mode=on \ /images/fedora-25.qcow2 While this may possibly be okay at least on my host (switching the order of the two devices): $qemu -machine q35,accel=kvm,kernel-irqchip=split \ -cpu host -smp 4 -m 2048 \ -nographic -nodefaults -serial stdio \ -device intel-iommu,intremap=on,eim=off,caching-mode=on \ -device vfio-pci,host=05:00.0,bus=pci.1 \ /images/fedora-25.qcow2 So not sure how the ordering of realization of these two devices (intel-iommu, vfio-pci) affected the behavior. One thing I suspect is that in vfio_realize(), we have: group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev), errp); while here we possibly will be getting &address_space_memory here instead of the correct DMA address space since Intel IOMMU device has not yet been inited... Before I go deeper, any thoughts? Thanks, -- peterx
> -----Original Message----- > From: Qemu-devel [mailto:qemu-devel-bounces+yi.l.liu=intel.com@nongnu.org] > On Behalf Of Peter Xu > Sent: Monday, February 20, 2017 3:48 PM > To: Alex Williamson <alex.williamson@redhat.com> > Cc: Lan, Tianyu <tianyu.lan@intel.com>; Tian, Kevin <kevin.tian@intel.com>; > mst@redhat.com; jan.kiszka@siemens.com; jasowang@redhat.com; qemu- > devel@nongnu.org; bd.aviv@gmail.com; David Gibson > <david@gibson.dropbear.id.au> > Subject: Re: [Qemu-devel] [PATCH v7 00/17] VT-d: vfio enablement and misc > enhances > > On Fri, Feb 17, 2017 at 10:18:35AM -0700, Alex Williamson wrote: > > On Tue, 7 Feb 2017 16:28:02 +0800 > > Peter Xu <peterx@redhat.com> wrote: > > > > > This is v7 of vt-d vfio enablement series. > > [snip] > > > ========= > > > Test Done > > > ========= > > > > > > Build test passed for x86_64/arm/ppc64. > > > > > > Simply tested with x86_64, assigning two PCI devices to a single VM, > > > boot the VM using: > > > > > > bin=x86_64-softmmu/qemu-system-x86_64 > > > $bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \ > > > -device intel-iommu,intremap=on,eim=off,caching-mode=on \ > > > -netdev user,id=net0,hostfwd=tcp::5555-:22 \ > > > -device virtio-net-pci,netdev=net0 \ > > > -device vfio-pci,host=03:00.0 \ > > > -device vfio-pci,host=02:00.0 \ > > > -trace events=".trace.vfio" \ > > > /var/lib/libvirt/images/vm1.qcow2 > > > > > > pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio > > > vtd_page_walk* > > > vtd_replay* > > > vtd_inv_desc* > > > > > > Then, in the guest, run the following tool: > > > > > > > > > https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind > > > -group/vfio-bind-group.c > > > > > > With parameter: > > > > > > ./vfio-bind-group 00:03.0 00:04.0 > > > > > > Check host side trace log, I can see pages are replayed and mapped > > > in > > > 00:04.0 device address space, like: > > > > > > ... > > > vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x401 lo > > > 0x38fe1001 vtd_page_walk Page walk for ce (0x401, 0x38fe1001) iova > > > range 0x0 - 0x8000000000 vtd_page_walk_level Page walk > > > (base=0x38fe1000, level=3) iova range 0x0 - 0x8000000000 > > > vtd_page_walk_level Page walk (base=0x35d31000, level=2) iova range > > > 0x0 - 0x40000000 vtd_page_walk_level Page walk (base=0x34979000, > > > level=1) iova range 0x0 - 0x200000 vtd_page_walk_one Page walk > > > detected map level 0x1 iova 0x0 -> gpa 0x22dc3000 mask 0xfff perm 3 > > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 -> > > > gpa 0x22e25000 mask 0xfff perm 3 vtd_page_walk_one Page walk > > > detected map level 0x1 iova 0x2000 -> gpa 0x22e12000 mask 0xfff perm > > > 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 -> > > > gpa 0x22e2d000 mask 0xfff perm 3 vtd_page_walk_one Page walk > > > detected map level 0x1 iova 0x4000 -> gpa 0x12a49000 mask 0xfff perm > > > 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 -> > > > gpa 0x129bb000 mask 0xfff perm 3 vtd_page_walk_one Page walk > > > detected map level 0x1 iova 0x6000 -> gpa 0x128db000 mask 0xfff perm 3 > vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa > 0x12a80000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map > level 0x1 iova 0x8000 -> gpa 0x12a7e000 mask 0xfff perm 3 > vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa > 0x12b22000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map > level 0x1 iova 0xa000 -> gpa 0x12b41000 mask 0xfff perm 3 ... > > > > Hi Peter, > > > > I'm trying to make use of this, with your vtd-vfio-enablement-v7 > > branch (HEAD 0c1c4e738095). I'm assigning an 82576 PF to a VM. It > > works with iommu=pt, but if I remove that option, the device does not > > work and vfio_iommu_map_notify is never called. Any suggestions? My > > commandline is below. Thanks, > > > > Alex > > > > /usr/local/bin/qemu-system-x86_64 \ > > -name guest=l1,debug-threads=on -S \ > > -machine pc-q35-2.9,accel=kvm,usb=off,dump-guest-core=off,kernel- > irqchip=split \ > > -cpu host -m 10240 -realtime mlock=off -smp > 4,sockets=1,cores=2,threads=2 \ > > -no-user-config -nodefaults -monitor stdio -rtc base=utc,driftfix=slew \ > > -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown \ > > -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 \ > > -boot strict=on \ > > -device ioh3420,port=0x10,chassis=1,id=pci.1,bus=pcie.0,addr=0x2 \ > > -device i82801b11-bridge,id=pci.2,bus=pcie.0,addr=0x1e \ > > -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2,addr=0x0 \ > > -device ioh3420,port=0x18,chassis=4,id=pci.4,bus=pcie.0,addr=0x3 \ > > -device ioh3420,port=0x20,chassis=5,id=pci.5,bus=pcie.0,addr=0x4 \ > > -device ioh3420,port=0x28,chassis=6,id=pci.6,bus=pcie.0,addr=0x5 \ > > -device ioh3420,port=0x30,chassis=7,id=pci.7,bus=pcie.0,addr=0x6 \ > > -device ioh3420,port=0x38,chassis=8,id=pci.8,bus=pcie.0,addr=0x7 \ > > -device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x1d.0x7 \ > > -device ich9-usb- > uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,addr=0x1d \ > > -device ich9-usb- > uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x1d.0x1 \ > > -device ich9-usb- > uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x1d.0x2 \ > > -device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 \ > > -drive file=/dev/vg_s20/lv_l1,format=raw,if=none,id=drive-virtio- > disk0,cache=none,aio=native \ > > -device virtio-blk-pci,scsi=off,bus=pci.5,addr=0x0,drive=drive-virtio- > disk0,id=virtio-disk0,bootindex=1 \ > > -netdev user,id=hostnet0 \ > > -device virtio-net- > pci,netdev=hostnet0,id=net0,mac=52:54:00:c2:62:30,bus=pci.1,addr=0x0 \ > > -device usb-tablet,id=input0,bus=usb.0,port=1 \ > > -vnc :0 -vga std \ > > -device vfio-pci,host=01:00.0,id=hostdev0,bus=pci.8,addr=0x0 \ > > -device intel-iommu,intremap=on,eim=off,caching-mode=on -trace > > events=/trace-events.txt -msg timestamp=on > > Alex, > > Thanks for testing this series. > > I think I reproduced it using my 10g nic as well. What I got is: > > [ 23.724787] ixgbe 0000:01:00.0 enp1s0: Detected Tx Unit Hang > [ 23.724787] Tx Queue <0> > [ 23.724787] TDH, TDT <0>, <1> > [ 23.724787] next_to_use <1> > [ 23.724787] next_to_clean <0> > [ 23.724787] tx_buffer_info[next_to_clean] > [ 23.724787] time_stamp <fffbb8bb> > [ 23.724787] jiffies <fffbc780> > [ 23.729580] ixgbe 0000:01:00.0 enp1s0: tx hang 1 detected on queue 0, > resetting adapter > [ 23.730752] ixgbe 0000:01:00.0 enp1s0: initiating reset due to tx timeout > [ 23.731768] ixgbe 0000:01:00.0 enp1s0: Reset adapter > > Is this the problem you have encountered? (adapter continuously reset) > > Interestingly, I found that the problem solves itself after I move the "-device > intel-iommu,..." line before all the other devices. I also encountered this interesting thing. yes, it is. you must place "-device intel-iommu" before the vfio-pci devices. If I remember correctly, if "device intel-iommu" is not in front the others, the vtd_realize is called after vfio_initfn, which would result in no calling of the following code snapshot. Then there is no channel between vfio device and intel-iommu, so everything is possible if such channel is gone. So better to place "intel-iommu" first place^_^ hw/vfio/common.c: vfio_listener_region_add() if (memory_region_is_iommu(section->mr)) { VFIOGuestIOMMU *giommu; trace_vfio_listener_region_add_iommu(iova, end); /* * FIXME: For VFIO iommu types which have KVM acceleration to * avoid bouncing all map/unmaps through qemu this way, this * would be the right place to wire that up (tell the KVM * device emulation the VFIO iommu handles to use). */ giommu = g_malloc0(sizeof(*giommu)); giommu->iommu = section->mr; giommu->iommu_offset = section->offset_within_address_space - section->offset_within_region; giommu->container = container; giommu->n.notify = vfio_iommu_map_notify; giommu->n.notifier_flags = IOMMU_NOTIFIER_ALL; Regards, Yi L > Or say, this will be the much shorter reproducer meet the bug: > > $qemu -machine q35,accel=kvm,kernel-irqchip=split \ > -cpu host -smp 4 -m 2048 \ > -nographic -nodefaults -serial stdio \ > -device vfio-pci,host=05:00.0,bus=pci.1 \ > -device intel-iommu,intremap=on,eim=off,caching-mode=on \ > /images/fedora-25.qcow2 > > While this may possibly be okay at least on my host (switching the order of the > two devices): > > $qemu -machine q35,accel=kvm,kernel-irqchip=split \ > -cpu host -smp 4 -m 2048 \ > -nographic -nodefaults -serial stdio \ > -device intel-iommu,intremap=on,eim=off,caching-mode=on \ > -device vfio-pci,host=05:00.0,bus=pci.1 \ > /images/fedora-25.qcow2 > > So not sure how the ordering of realization of these two devices (intel-iommu, > vfio-pci) affected the behavior. One thing I suspect is that in vfio_realize(), we > have: > > group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev), > errp); > > while here we possibly will be getting &address_space_memory here instead of > the correct DMA address space since Intel IOMMU device has not yet been > inited... > > Before I go deeper, any thoughts? > > Thanks, > > -- peterx
On Mon, Feb 20, 2017 at 08:17:32AM +0000, Liu, Yi L wrote: > > -----Original Message----- > > From: Qemu-devel [mailto:qemu-devel-bounces+yi.l.liu=intel.com@nongnu.org] > > On Behalf Of Peter Xu > > Sent: Monday, February 20, 2017 3:48 PM > > To: Alex Williamson <alex.williamson@redhat.com> > > Cc: Lan, Tianyu <tianyu.lan@intel.com>; Tian, Kevin <kevin.tian@intel.com>; > > mst@redhat.com; jan.kiszka@siemens.com; jasowang@redhat.com; qemu- > > devel@nongnu.org; bd.aviv@gmail.com; David Gibson > > <david@gibson.dropbear.id.au> > > Subject: Re: [Qemu-devel] [PATCH v7 00/17] VT-d: vfio enablement and misc > > enhances > > > > On Fri, Feb 17, 2017 at 10:18:35AM -0700, Alex Williamson wrote: > > > On Tue, 7 Feb 2017 16:28:02 +0800 > > > Peter Xu <peterx@redhat.com> wrote: > > > > > > > This is v7 of vt-d vfio enablement series. > > > [snip] > > > > ========= > > > > Test Done > > > > ========= > > > > > > > > Build test passed for x86_64/arm/ppc64. > > > > > > > > Simply tested with x86_64, assigning two PCI devices to a single VM, > > > > boot the VM using: > > > > > > > > bin=x86_64-softmmu/qemu-system-x86_64 > > > > $bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \ > > > > -device intel-iommu,intremap=on,eim=off,caching-mode=on \ > > > > -netdev user,id=net0,hostfwd=tcp::5555-:22 \ > > > > -device virtio-net-pci,netdev=net0 \ > > > > -device vfio-pci,host=03:00.0 \ > > > > -device vfio-pci,host=02:00.0 \ > > > > -trace events=".trace.vfio" \ > > > > /var/lib/libvirt/images/vm1.qcow2 > > > > > > > > pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio > > > > vtd_page_walk* > > > > vtd_replay* > > > > vtd_inv_desc* > > > > > > > > Then, in the guest, run the following tool: > > > > > > > > > > > > https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind > > > > -group/vfio-bind-group.c > > > > > > > > With parameter: > > > > > > > > ./vfio-bind-group 00:03.0 00:04.0 > > > > > > > > Check host side trace log, I can see pages are replayed and mapped > > > > in > > > > 00:04.0 device address space, like: > > > > > > > > ... > > > > vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x401 lo > > > > 0x38fe1001 vtd_page_walk Page walk for ce (0x401, 0x38fe1001) iova > > > > range 0x0 - 0x8000000000 vtd_page_walk_level Page walk > > > > (base=0x38fe1000, level=3) iova range 0x0 - 0x8000000000 > > > > vtd_page_walk_level Page walk (base=0x35d31000, level=2) iova range > > > > 0x0 - 0x40000000 vtd_page_walk_level Page walk (base=0x34979000, > > > > level=1) iova range 0x0 - 0x200000 vtd_page_walk_one Page walk > > > > detected map level 0x1 iova 0x0 -> gpa 0x22dc3000 mask 0xfff perm 3 > > > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 -> > > > > gpa 0x22e25000 mask 0xfff perm 3 vtd_page_walk_one Page walk > > > > detected map level 0x1 iova 0x2000 -> gpa 0x22e12000 mask 0xfff perm > > > > 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 -> > > > > gpa 0x22e2d000 mask 0xfff perm 3 vtd_page_walk_one Page walk > > > > detected map level 0x1 iova 0x4000 -> gpa 0x12a49000 mask 0xfff perm > > > > 3 vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 -> > > > > gpa 0x129bb000 mask 0xfff perm 3 vtd_page_walk_one Page walk > > > > detected map level 0x1 iova 0x6000 -> gpa 0x128db000 mask 0xfff perm 3 > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa > > 0x12a80000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map > > level 0x1 iova 0x8000 -> gpa 0x12a7e000 mask 0xfff perm 3 > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa > > 0x12b22000 mask 0xfff perm 3 vtd_page_walk_one Page walk detected map > > level 0x1 iova 0xa000 -> gpa 0x12b41000 mask 0xfff perm 3 ... > > > > > > Hi Peter, > > > > > > I'm trying to make use of this, with your vtd-vfio-enablement-v7 > > > branch (HEAD 0c1c4e738095). I'm assigning an 82576 PF to a VM. It > > > works with iommu=pt, but if I remove that option, the device does not > > > work and vfio_iommu_map_notify is never called. Any suggestions? My > > > commandline is below. Thanks, > > > > > > Alex > > > > > > /usr/local/bin/qemu-system-x86_64 \ > > > -name guest=l1,debug-threads=on -S \ > > > -machine pc-q35-2.9,accel=kvm,usb=off,dump-guest-core=off,kernel- > > irqchip=split \ > > > -cpu host -m 10240 -realtime mlock=off -smp > > 4,sockets=1,cores=2,threads=2 \ > > > -no-user-config -nodefaults -monitor stdio -rtc base=utc,driftfix=slew \ > > > -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown \ > > > -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 \ > > > -boot strict=on \ > > > -device ioh3420,port=0x10,chassis=1,id=pci.1,bus=pcie.0,addr=0x2 \ > > > -device i82801b11-bridge,id=pci.2,bus=pcie.0,addr=0x1e \ > > > -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2,addr=0x0 \ > > > -device ioh3420,port=0x18,chassis=4,id=pci.4,bus=pcie.0,addr=0x3 \ > > > -device ioh3420,port=0x20,chassis=5,id=pci.5,bus=pcie.0,addr=0x4 \ > > > -device ioh3420,port=0x28,chassis=6,id=pci.6,bus=pcie.0,addr=0x5 \ > > > -device ioh3420,port=0x30,chassis=7,id=pci.7,bus=pcie.0,addr=0x6 \ > > > -device ioh3420,port=0x38,chassis=8,id=pci.8,bus=pcie.0,addr=0x7 \ > > > -device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x1d.0x7 \ > > > -device ich9-usb- > > uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,addr=0x1d \ > > > -device ich9-usb- > > uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x1d.0x1 \ > > > -device ich9-usb- > > uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x1d.0x2 \ > > > -device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 \ > > > -drive file=/dev/vg_s20/lv_l1,format=raw,if=none,id=drive-virtio- > > disk0,cache=none,aio=native \ > > > -device virtio-blk-pci,scsi=off,bus=pci.5,addr=0x0,drive=drive-virtio- > > disk0,id=virtio-disk0,bootindex=1 \ > > > -netdev user,id=hostnet0 \ > > > -device virtio-net- > > pci,netdev=hostnet0,id=net0,mac=52:54:00:c2:62:30,bus=pci.1,addr=0x0 \ > > > -device usb-tablet,id=input0,bus=usb.0,port=1 \ > > > -vnc :0 -vga std \ > > > -device vfio-pci,host=01:00.0,id=hostdev0,bus=pci.8,addr=0x0 \ > > > -device intel-iommu,intremap=on,eim=off,caching-mode=on -trace > > > events=/trace-events.txt -msg timestamp=on > > > > Alex, > > > > Thanks for testing this series. > > > > I think I reproduced it using my 10g nic as well. What I got is: > > > > [ 23.724787] ixgbe 0000:01:00.0 enp1s0: Detected Tx Unit Hang > > [ 23.724787] Tx Queue <0> > > [ 23.724787] TDH, TDT <0>, <1> > > [ 23.724787] next_to_use <1> > > [ 23.724787] next_to_clean <0> > > [ 23.724787] tx_buffer_info[next_to_clean] > > [ 23.724787] time_stamp <fffbb8bb> > > [ 23.724787] jiffies <fffbc780> > > [ 23.729580] ixgbe 0000:01:00.0 enp1s0: tx hang 1 detected on queue 0, > > resetting adapter > > [ 23.730752] ixgbe 0000:01:00.0 enp1s0: initiating reset due to tx timeout > > [ 23.731768] ixgbe 0000:01:00.0 enp1s0: Reset adapter > > > > Is this the problem you have encountered? (adapter continuously reset) > > > > Interestingly, I found that the problem solves itself after I move the "-device > > intel-iommu,..." line before all the other devices. > > I also encountered this interesting thing. yes, it is. you must place > "-device intel-iommu" before the vfio-pci devices. If I remember correctly, > if "device intel-iommu" is not in front the others, the vtd_realize is called after > vfio_initfn, which would result in no calling of the following code snapshot. > Then there is no channel between vfio device and intel-iommu, so everything > is possible if such channel is gone. So better to place "intel-iommu" first place^_^ > > hw/vfio/common.c: vfio_listener_region_add() > if (memory_region_is_iommu(section->mr)) { > VFIOGuestIOMMU *giommu; > > trace_vfio_listener_region_add_iommu(iova, end); > /* > * FIXME: For VFIO iommu types which have KVM acceleration to > * avoid bouncing all map/unmaps through qemu this way, this > * would be the right place to wire that up (tell the KVM > * device emulation the VFIO iommu handles to use). > */ > giommu = g_malloc0(sizeof(*giommu)); > giommu->iommu = section->mr; > giommu->iommu_offset = section->offset_within_address_space - > section->offset_within_region; > giommu->container = container; > giommu->n.notify = vfio_iommu_map_notify; > giommu->n.notifier_flags = IOMMU_NOTIFIER_ALL; Yeah. I think that's possibly because when we do "-device vfio-pci" first then "-device intel-iommu" then we are actually listening to the &address_space_memory and any real update on the IOMMU address space is lost. Imho forcing user to add "-device intel-iommu" first might be a little bit "tough" indeed. Not sure whether we should just provide (or do we have it?) a way to decide the init order of device list. Thanks, -- peterx
On Mon, 20 Feb 2017 15:47:31 +0800 Peter Xu <peterx@redhat.com> wrote: > On Fri, Feb 17, 2017 at 10:18:35AM -0700, Alex Williamson wrote: > > On Tue, 7 Feb 2017 16:28:02 +0800 > > Peter Xu <peterx@redhat.com> wrote: > > > > > This is v7 of vt-d vfio enablement series. > > [snip] > > > ========= > > > Test Done > > > ========= > > > > > > Build test passed for x86_64/arm/ppc64. > > > > > > Simply tested with x86_64, assigning two PCI devices to a single VM, > > > boot the VM using: > > > > > > bin=x86_64-softmmu/qemu-system-x86_64 > > > $bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \ > > > -device intel-iommu,intremap=on,eim=off,caching-mode=on \ > > > -netdev user,id=net0,hostfwd=tcp::5555-:22 \ > > > -device virtio-net-pci,netdev=net0 \ > > > -device vfio-pci,host=03:00.0 \ > > > -device vfio-pci,host=02:00.0 \ > > > -trace events=".trace.vfio" \ > > > /var/lib/libvirt/images/vm1.qcow2 > > > > > > pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio > > > vtd_page_walk* > > > vtd_replay* > > > vtd_inv_desc* > > > > > > Then, in the guest, run the following tool: > > > > > > https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind-group/vfio-bind-group.c > > > > > > With parameter: > > > > > > ./vfio-bind-group 00:03.0 00:04.0 > > > > > > Check host side trace log, I can see pages are replayed and mapped in > > > 00:04.0 device address space, like: > > > > > > ... > > > vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x401 lo 0x38fe1001 > > > vtd_page_walk Page walk for ce (0x401, 0x38fe1001) iova range 0x0 - 0x8000000000 > > > vtd_page_walk_level Page walk (base=0x38fe1000, level=3) iova range 0x0 - 0x8000000000 > > > vtd_page_walk_level Page walk (base=0x35d31000, level=2) iova range 0x0 - 0x40000000 > > > vtd_page_walk_level Page walk (base=0x34979000, level=1) iova range 0x0 - 0x200000 > > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x0 -> gpa 0x22dc3000 mask 0xfff perm 3 > > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 -> gpa 0x22e25000 mask 0xfff perm 3 > > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x2000 -> gpa 0x22e12000 mask 0xfff perm 3 > > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 -> gpa 0x22e2d000 mask 0xfff perm 3 > > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x4000 -> gpa 0x12a49000 mask 0xfff perm 3 > > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 -> gpa 0x129bb000 mask 0xfff perm 3 > > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x6000 -> gpa 0x128db000 mask 0xfff perm 3 > > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa 0x12a80000 mask 0xfff perm 3 > > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x8000 -> gpa 0x12a7e000 mask 0xfff perm 3 > > > vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa 0x12b22000 mask 0xfff perm 3 > > > vtd_page_walk_one Page walk detected map level 0x1 iova 0xa000 -> gpa 0x12b41000 mask 0xfff perm 3 > > > ... > > > > Hi Peter, > > > > I'm trying to make use of this, with your vtd-vfio-enablement-v7 branch > > (HEAD 0c1c4e738095). I'm assigning an 82576 PF to a VM. It works with > > iommu=pt, but if I remove that option, the device does not work and > > vfio_iommu_map_notify is never called. Any suggestions? My > > commandline is below. Thanks, > > > > Alex > > > > /usr/local/bin/qemu-system-x86_64 \ > > -name guest=l1,debug-threads=on -S \ > > -machine pc-q35-2.9,accel=kvm,usb=off,dump-guest-core=off,kernel-irqchip=split \ > > -cpu host -m 10240 -realtime mlock=off -smp 4,sockets=1,cores=2,threads=2 \ > > -no-user-config -nodefaults -monitor stdio -rtc base=utc,driftfix=slew \ > > -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown \ > > -global ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 \ > > -boot strict=on \ > > -device ioh3420,port=0x10,chassis=1,id=pci.1,bus=pcie.0,addr=0x2 \ > > -device i82801b11-bridge,id=pci.2,bus=pcie.0,addr=0x1e \ > > -device pci-bridge,chassis_nr=3,id=pci.3,bus=pci.2,addr=0x0 \ > > -device ioh3420,port=0x18,chassis=4,id=pci.4,bus=pcie.0,addr=0x3 \ > > -device ioh3420,port=0x20,chassis=5,id=pci.5,bus=pcie.0,addr=0x4 \ > > -device ioh3420,port=0x28,chassis=6,id=pci.6,bus=pcie.0,addr=0x5 \ > > -device ioh3420,port=0x30,chassis=7,id=pci.7,bus=pcie.0,addr=0x6 \ > > -device ioh3420,port=0x38,chassis=8,id=pci.8,bus=pcie.0,addr=0x7 \ > > -device ich9-usb-ehci1,id=usb,bus=pcie.0,addr=0x1d.0x7 \ > > -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pcie.0,multifunction=on,addr=0x1d \ > > -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pcie.0,addr=0x1d.0x1 \ > > -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pcie.0,addr=0x1d.0x2 \ > > -device virtio-serial-pci,id=virtio-serial0,bus=pci.4,addr=0x0 \ > > -drive file=/dev/vg_s20/lv_l1,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native \ > > -device virtio-blk-pci,scsi=off,bus=pci.5,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \ > > -netdev user,id=hostnet0 \ > > -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:c2:62:30,bus=pci.1,addr=0x0 \ > > -device usb-tablet,id=input0,bus=usb.0,port=1 \ > > -vnc :0 -vga std \ > > -device vfio-pci,host=01:00.0,id=hostdev0,bus=pci.8,addr=0x0 \ > > -device intel-iommu,intremap=on,eim=off,caching-mode=on -trace events=/trace-events.txt -msg timestamp=on > > Alex, > > Thanks for testing this series. > > I think I reproduced it using my 10g nic as well. What I got is: > > [ 23.724787] ixgbe 0000:01:00.0 enp1s0: Detected Tx Unit Hang > [ 23.724787] Tx Queue <0> > [ 23.724787] TDH, TDT <0>, <1> > [ 23.724787] next_to_use <1> > [ 23.724787] next_to_clean <0> > [ 23.724787] tx_buffer_info[next_to_clean] > [ 23.724787] time_stamp <fffbb8bb> > [ 23.724787] jiffies <fffbc780> > [ 23.729580] ixgbe 0000:01:00.0 enp1s0: tx hang 1 detected on queue 0, resetting adapter > [ 23.730752] ixgbe 0000:01:00.0 enp1s0: initiating reset due to tx timeout > [ 23.731768] ixgbe 0000:01:00.0 enp1s0: Reset adapter > > Is this the problem you have encountered? (adapter continuously reset) > > Interestingly, I found that the problem solves itself after I move the > "-device intel-iommu,..." line before all the other devices. > > Or say, this will be the much shorter reproducer meet the bug: > > $qemu -machine q35,accel=kvm,kernel-irqchip=split \ > -cpu host -smp 4 -m 2048 \ > -nographic -nodefaults -serial stdio \ > -device vfio-pci,host=05:00.0,bus=pci.1 \ > -device intel-iommu,intremap=on,eim=off,caching-mode=on \ > /images/fedora-25.qcow2 > > While this may possibly be okay at least on my host (switching the > order of the two devices): > > $qemu -machine q35,accel=kvm,kernel-irqchip=split \ > -cpu host -smp 4 -m 2048 \ > -nographic -nodefaults -serial stdio \ > -device intel-iommu,intremap=on,eim=off,caching-mode=on \ > -device vfio-pci,host=05:00.0,bus=pci.1 \ > /images/fedora-25.qcow2 > > So not sure how the ordering of realization of these two devices > (intel-iommu, vfio-pci) affected the behavior. One thing I suspect is > that in vfio_realize(), we have: > > group = vfio_get_group(groupid, pci_device_iommu_address_space(pdev), errp); > > while here we possibly will be getting &address_space_memory here > instead of the correct DMA address space since Intel IOMMU device has > not yet been inited... > > Before I go deeper, any thoughts? Sounds theory, seems confirmed by Yi. Makes it pretty impossible to test using libvirt <qemu:arg> support, which is how I derived my VM commandline. Thanks, Alex
On Tue, Feb 07, 2017 at 04:28:02PM +0800, Peter Xu wrote: > This is v7 of vt-d vfio enablement series. > > v7: > - for the two traces patches: Change subjects. Remove vtd_err() and > vtd_err_nonzero_rsvd() tracers, instead using standalone trace for > each of the places. Don't remove any DPRINTF() if there is no > replacement. [Jason] > - add r-b and a-b for Alex/David/Jason. > - in patch "intel_iommu: renaming gpa to iova where proper", convert > one more place where I missed [Jason] > - fix the place where I should use "~0ULL" not "~0" [Jason] > - squash patch 16 into 18 [Jason] Hi, Michael, Do you have plan to have patch 11-17 as well in 2.9? Just a kind reminder in case you have it since it's reaching soft freeze. Thanks, -- peterx
© 2016 - 2024 Red Hat, Inc.