[edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support

Brijesh Singh posted 3 patches 7 years, 5 months ago
Failed in applying to current master (apply log)
OvmfPkg/Library/VirtioLib/VirtioLib.inf      |   1 +
OvmfPkg/VirtioBlkDxe/VirtioBlk.inf           |   5 +
OvmfPkg/VirtioGpuDxe/VirtioGpu.inf           |   1 +
OvmfPkg/VirtioNetDxe/VirtioNet.inf           |   1 +
OvmfPkg/VirtioRngDxe/VirtioRng.inf           |   1 +
OvmfPkg/VirtioScsiDxe/VirtioScsi.inf         |   1 +
OvmfPkg/Include/IndustryStandard/Virtio095.h |   1 +
OvmfPkg/Include/IndustryStandard/Virtio10.h  |   5 +
OvmfPkg/Include/Library/VirtioLib.h          |  20 ++++
OvmfPkg/Library/VirtioLib/VirtioLib.c        |  96 ++++++++++++++-
OvmfPkg/VirtioBlkDxe/VirtioBlk.c             | 125 ++++++++++++++++++--
11 files changed, 244 insertions(+), 13 deletions(-)
[edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Brijesh Singh 7 years, 5 months ago
I have found that OVMF fails to detect the disk when iommu_platform is set from
qemu cli. The failure occurs during the feature bit negotiation.

Recently, EDKII introduced IOMMU protocol d1fddc4533bf. SEV patch series introduced
a IoMmu protocol driver f9d129e68a45 to set a DMA access attribute and methods to
allocate, free, map and unmap the DMA memory for the master bus devices

In this patch series, I have tried to enable the IOMMU_PLATFORM feature for
VirtioBlkDevice. I am sending this as RFC to seek feedback before I extend the support
for other Virtio devices. The patch has been tested in SEV guest - mainly because
IoMmuDxe driver installs the IOMMU protocol for SEV guest only. If needed, I can
extend the IoMmuDxe driver to install IOMMU protocol for non SEV guests.

qemu cli used for testing:

# $QEMU \
  ...
  -drive file=${HDA_FILE},if=none,id=disk0,format=qcow2 \
  -device virtio-blk-pci,drive=disk0,disable-legacy=on,iommu_platform=true,disable-modern=off,scsi=off
  ...

Cc: Jordan Justen <jordan.l.justen@intel.com>
Cc: Laszlo Ersek <lersek@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>

Brijesh Singh (3):
  OvmfPkg/Include/Virtio10: Define VIRTIO_F_IOMMU_PLATFORM feature bit
  OvmfPkg/VirtioLib: Add IOMMU_PLATFORM support
  OvmfPkg/VirtioBlkDxe: Add VIRITO_F_IOMMU_PLATFORM support

 OvmfPkg/Library/VirtioLib/VirtioLib.inf      |   1 +
 OvmfPkg/VirtioBlkDxe/VirtioBlk.inf           |   5 +
 OvmfPkg/VirtioGpuDxe/VirtioGpu.inf           |   1 +
 OvmfPkg/VirtioNetDxe/VirtioNet.inf           |   1 +
 OvmfPkg/VirtioRngDxe/VirtioRng.inf           |   1 +
 OvmfPkg/VirtioScsiDxe/VirtioScsi.inf         |   1 +
 OvmfPkg/Include/IndustryStandard/Virtio095.h |   1 +
 OvmfPkg/Include/IndustryStandard/Virtio10.h  |   5 +
 OvmfPkg/Include/Library/VirtioLib.h          |  20 ++++
 OvmfPkg/Library/VirtioLib/VirtioLib.c        |  96 ++++++++++++++-
 OvmfPkg/VirtioBlkDxe/VirtioBlk.c             | 125 ++++++++++++++++++--
 11 files changed, 244 insertions(+), 13 deletions(-)

-- 
Brijesh Singh
2.7.4

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Laszlo Ersek 7 years, 5 months ago
Adding Ard

On 07/20/17 00:09, Brijesh Singh wrote:
> I have found that OVMF fails to detect the disk when iommu_platform is set from
> qemu cli. The failure occurs during the feature bit negotiation.
> 
> Recently, EDKII introduced IOMMU protocol d1fddc4533bf. SEV patch series introduced
> a IoMmu protocol driver f9d129e68a45 to set a DMA access attribute and methods to
> allocate, free, map and unmap the DMA memory for the master bus devices
> 
> In this patch series, I have tried to enable the IOMMU_PLATFORM feature for
> VirtioBlkDevice. I am sending this as RFC to seek feedback before I extend the support
> for other Virtio devices. The patch has been tested in SEV guest - mainly because
> IoMmuDxe driver installs the IOMMU protocol for SEV guest only. If needed, I can
> extend the IoMmuDxe driver to install IOMMU protocol for non SEV guests.
> 
> qemu cli used for testing:
> 
> # $QEMU \
>   ...
>   -drive file=${HDA_FILE},if=none,id=disk0,format=qcow2 \
>   -device virtio-blk-pci,drive=disk0,disable-legacy=on,iommu_platform=true,disable-modern=off,scsi=off
>   ...
> 
> Cc: Jordan Justen <jordan.l.justen@intel.com>
> Cc: Laszlo Ersek <lersek@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Contributed-under: TianoCore Contribution Agreement 1.0
> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
> 
> Brijesh Singh (3):
>   OvmfPkg/Include/Virtio10: Define VIRTIO_F_IOMMU_PLATFORM feature bit
>   OvmfPkg/VirtioLib: Add IOMMU_PLATFORM support
>   OvmfPkg/VirtioBlkDxe: Add VIRITO_F_IOMMU_PLATFORM support

In the course of processing my post-PTO email backlog, yesterday I
skimmed this topic very quickly, and thought about it for an hour or so
(while away for the computer). Here's my take on it.


(1) The closest to any formal language that I found for
VIRTIO_F_IOMMU_PLATFORM are:

  https://issues.oasis-open.org/browse/VIRTIO-154
  https://lists.oasis-open.org/archives/virtio-dev/201610/msg00121.html

Are these references up-to-date? The ticket also references SVN r587 as
the resolution, but that link is dead.


(2) A few weeks back I saw Jason's SeaBIOS patch (also pointed out to me
more recently by Brijesh, in private):

  [SeaBIOS] [PATCH] virtio: IOMMU support

I don't understand this patch. (I also don't understand the
"iommu_platform" device property on the QEMU command line.) According to
the spec language quoted from the mailing list above, we have four cases:

(2.1) device does not offer VIRITO_F_IOMMU_PLATFORM --> everything works
like before

(2.2) device offers VIRITO_F_IOMMU_PLATFORM, but the driver does not
negotiate it --> device is allowed to reject functioning

* device offers VIRITO_F_IOMMU_PLATFORM and the driver negotiates it:
  there are two possibilities:
  (2.3) the driver *disables* the IOMMU, and works like before
  (2.4) the driver *configures* the IOMMU and works accordingly

The SeaBIOS patch negotiates VIRITO_F_IOMMU_PLATFORM, but it *neither*
disables *nor* configures the IOMMU. It simply *ignores* the IOMMU. I
don't see how that is any different *in effect* from simply not
negotiating VIRITO_F_IOMMU_PLATFORM -- case (2.2) --, and I don't
understand why QEMU tolerates "ignoring the IOMMU" differently from "not
negotiating the flag".


(3) *If* we indeed just wanted to follow the SeaBIOS patch, then
VIRTIO_F_IOMMU_PLATFORM should be treated in OVMF *precisely* in
parallel with VIRTIO_F_VERSION_1.


(4) I'm confused by the intersection of the SEV and
VIRTIO_F_IOMMU_PLATFORM use cases. The IoMmu DXE driver added in edk2
commit f9d129e68a45 ("OvmfPkg: Add IoMmuDxe driver", 2017-07-06) is
specific to SEV.

In SEV-less guests, the IoMmu DXE driver will not install the IOMMU
protocol, but the QEMU command line may still contain the
"iommu_platform=true" property. For example, as far as I recall, DPDK
guest payloads use an emulated "viommu" device. For this OVMF, would
have to provide a separate IOMMU DXE driver, one that could actually
interact with the "viommu" device. And there should be some coordination
that exactly one instance of gEdkiiIoMmuProtocolGuid OR
gIoMmuAbsentProtocolGuid be installed.

I see that the SEV references are only in the blurb of the patch set; no
actual commit message refers to SEV. That's OK; I just think the blurb
is confusing.


(5) Now I'm coming to my main point. The virtio device drivers in
OvmfPkg are UEFI_DRIVER modules that conform to the UEFI driver model.
They consume our home-grown the VIRTIO_DEVICE_PROTOCOL, and produce
whatever UEFI protocol is appropriate on top (for example,
EFI_BLOCK_IO_PROTOCOL in VirtioBlkDxe).

Such a driver has no business talking to the platform's IOMMU protocol
directly (even if there is one). Instead:


(5.1) The VIRTIO_DEVICE_PROTOCOL has to be extended with the necessary

  AllocateSharedPages, FreeSharedPages, MapSharedPages, UnmapSharedPages

functions.


(5.2) All top-level virtio driver code, including VirtioLib, has to be
rebased to the new VIRTIO_DEVICE_PROTOCOL member functions. This covers
the ring memory itself, the memory pointed-to by the descriptors placed
on the ring(s) -- i.e., the device specific requests --, and all further
memory that is referenced by those device specific requests.

This will result in a larger memory footprint, as all current pool
allocations will be turned into page allocations, but I guess that is
tolerable.


(5.3) All virtio driver code should treat VIRTIO_F_IOMMU_PLATFORM simply
in parallel with VIRTIO_F_VERSION_1, and don't act upon
VIRTIO_F_IOMMU_PLATFORM in any special shape or form. So basically this
is just my point (3) from above.


(5.4) There are three VIRTIO_DEVICE_PROTOCOL implementations in edk2:

- "OvmfPkg/VirtioPciDeviceDxe" binds legacy-only and transitional
  virtio-pci devices, and offers virtio 0.9.5 semantics.

- "ArmVirtPkg/VirtioFdtDxe" (via "OvmfPkg/Library/VirtioMmioDeviceLib")
  binds virtio-mmio devices, and offers virtio 0.9.5 semantics.

- "OvmfPkg/Virtio10Dxe" binds modern-only virtio-pci devices, and offers
  virtio 1.0.0 semantics.

The first two drivers should implement the AllocateSharedPages() and
FreeSharedPages() member functions simply with the corresponding
MemoryAllocationLib functions (using BootServicesData type memory), and
implement the MapSharedPages() and UnmapSharedPages() member functions
as no-ops (return the input addresses transparently).

The third driver should implement all four new member functions by
respectively delegating the job to:
- EFI_PCI_IO_PROTOCOL.AllocateBuffer() -- with BootServicesData --
- EFI_PCI_IO_PROTOCOL.FreeBuffer()
- EFI_PCI_IO_PROTOCOL.Map() -- with BusMasterCommonBuffer64 --
- EFI_PCI_IO_PROTOCOL.Unmap()

The EFI_PCI_IO_PROTOCOL implementation will delegate these calls to the
platform-specific PCI host bridge / root bridge driver, and *that*
driver in turn is allowed to talk to an IOMMU protocol (if any).

(This last step is already covered by the following edk2 commits:
- generally, c15da8eb3587 ("MdeModulePkg/PciHostBridge: Add IOMMU
support.", 2017-04-29),
- specifically for SEV in OVMF, c6ab9aecb71b ("OvmfPkg: update
PciHostBridgeDxe to use PlatformHasIoMmuLib", 2017-07-06).)


(5.5) There's a delicate question that has to be considered with care;
namely the ExitBootServices() callbacks in the virtio drivers.

Currently these functions perform virtio resets. They cause the devices
to forget all their configuration (including guest RAM references).
Aborting in-flight DMA (e.g. in VirtioNetDxe) is a practice that is
specifically recommended in the UEFI Driver Writers' Guide; plus at some
point the guest kernel will reclaim and overwrite BootServicesData type
memory. If at that point QEMU is still looking at some (originally
firmware-allocated) areas as virtio rings, the results won't be amusing.
(Speaking from experience.) Hence the resetting of virtio devices upon
ExitBootServices().

Now, remember that ExitBootServices() callbacks *must not* change the
UEFI memory map, so *no* memory allocations *must* be released in the
callbacks. The virtio resets performed in the callbacks are surgical for
this reason; nothing else is being done. The memory will be reclaimed by
the OS, later.

With an IOMMU in the picture, further actions become necessary: *after*
the virtio reset, any buffers previously in use (including rings, device
specific requests pointed-to by descriptors, and any further memory
referenced by those requests) must be *unmapped*, but *not freed*.

(Speaking in SEV terms, this will result in those memory areas seeing
their C bits restored, without changing the UEFI memmap.)

This means that:
- the new VIRTIO_DEVICE_PROTOCOL.UnmapSharedPages() function has to be
called judiciously from these callbacks, after the virtio reset,
- *and* that the *entire* call chain originating from
UnmapSharedPages(), through PciIo, through PciRootBridgeIo, to
EdkiiIoMmu, *must* not call gBS->FreePool() or gBS->FreePages() (or the
equivalent MemoryAllocationLib functions).

Note that if the last requirement is currently violated (outside of
OvmfPkg), then that is a general problem for physical platforms as well
-- IMO, a physical NIC driver too should be able to abort DMA in its
exit-boot-services callback and then unmap any relevant IOMMU mappings
(via PciIo->Unmap().)

Thanks
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Brijesh Singh 7 years, 5 months ago
Hi Laszlo,

On 07/25/2017 01:17 PM, Laszlo Ersek wrote:
> Adding Ard
> 
> On 07/20/17 00:09, Brijesh Singh wrote:
>> I have found that OVMF fails to detect the disk when iommu_platform is set from
>> qemu cli. The failure occurs during the feature bit negotiation.
>>
>> Recently, EDKII introduced IOMMU protocol d1fddc4533bf. SEV patch series introduced
>> a IoMmu protocol driver f9d129e68a45 to set a DMA access attribute and methods to
>> allocate, free, map and unmap the DMA memory for the master bus devices
>>
>> In this patch series, I have tried to enable the IOMMU_PLATFORM feature for
>> VirtioBlkDevice. I am sending this as RFC to seek feedback before I extend the support
>> for other Virtio devices. The patch has been tested in SEV guest - mainly because
>> IoMmuDxe driver installs the IOMMU protocol for SEV guest only. If needed, I can
>> extend the IoMmuDxe driver to install IOMMU protocol for non SEV guests.
>>
>> qemu cli used for testing:
>>
>> # $QEMU \
>>    ...
>>    -drive file=${HDA_FILE},if=none,id=disk0,format=qcow2 \
>>    -device virtio-blk-pci,drive=disk0,disable-legacy=on,iommu_platform=true,disable-modern=off,scsi=off
>>    ...
>>
>> Cc: Jordan Justen <jordan.l.justen@intel.com>
>> Cc: Laszlo Ersek <lersek@redhat.com>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Contributed-under: TianoCore Contribution Agreement 1.0
>> Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
>>
>> Brijesh Singh (3):
>>    OvmfPkg/Include/Virtio10: Define VIRTIO_F_IOMMU_PLATFORM feature bit
>>    OvmfPkg/VirtioLib: Add IOMMU_PLATFORM support
>>    OvmfPkg/VirtioBlkDxe: Add VIRITO_F_IOMMU_PLATFORM support
> 
> In the course of processing my post-PTO email backlog, yesterday I
> skimmed this topic very quickly, and thought about it for an hour or so
> (while away for the computer). Here's my take on it.
> 
> 
> (1) The closest to any formal language that I found for
> VIRTIO_F_IOMMU_PLATFORM are:
> 
>    https://issues.oasis-open.org/browse/VIRTIO-154
>    https://lists.oasis-open.org/archives/virtio-dev/201610/msg00121.html
> 
> Are these references up-to-date? The ticket also references SVN r587 as
> the resolution, but that link is dead.
> 
> 
> (2) A few weeks back I saw Jason's SeaBIOS patch (also pointed out to me
> more recently by Brijesh, in private):
> 
>    [SeaBIOS] [PATCH] virtio: IOMMU support
> 
> I don't understand this patch. (I also don't understand the
> "iommu_platform" device property on the QEMU command line.) According to
> the spec language quoted from the mailing list above, we have four cases:
> 
> (2.1) device does not offer VIRITO_F_IOMMU_PLATFORM --> everything works
> like before
> 
> (2.2) device offers VIRITO_F_IOMMU_PLATFORM, but the driver does not
> negotiate it --> device is allowed to reject functioning
> 
> * device offers VIRITO_F_IOMMU_PLATFORM and the driver negotiates it:
>    there are two possibilities:
>    (2.3) the driver *disables* the IOMMU, and works like before
>    (2.4) the driver *configures* the IOMMU and works accordingly
> 
> The SeaBIOS patch negotiates VIRITO_F_IOMMU_PLATFORM, but it *neither*
> disables *nor* configures the IOMMU. It simply *ignores* the IOMMU. I
> don't see how that is any different *in effect* from simply not
> negotiating VIRITO_F_IOMMU_PLATFORM -- case (2.2) --, and I don't
> understand why QEMU tolerates "ignoring the IOMMU" differently from "not
> negotiating the flag".
> 
> 
> (3) *If* we indeed just wanted to follow the SeaBIOS patch, then
> VIRTIO_F_IOMMU_PLATFORM should be treated in OVMF *precisely* in
> parallel with VIRTIO_F_VERSION_1.
> 
> 
> (4) I'm confused by the intersection of the SEV and
> VIRTIO_F_IOMMU_PLATFORM use cases. The IoMmu DXE driver added in edk2
> commit f9d129e68a45 ("OvmfPkg: Add IoMmuDxe driver", 2017-07-06) is
> specific to SEV.
> 
> In SEV-less guests, the IoMmu DXE driver will not install the IOMMU
> protocol, but the QEMU command line may still contain the
> "iommu_platform=true" property. For example, as far as I recall, DPDK
> guest payloads use an emulated "viommu" device. For this OVMF, would
> have to provide a separate IOMMU DXE driver, one that could actually
> interact with the "viommu" device. And there should be some coordination
> that exactly one instance of gEdkiiIoMmuProtocolGuid OR
> gIoMmuAbsentProtocolGuid be installed.
> 
> I see that the SEV references are only in the blurb of the patch set; no
> actual commit message refers to SEV. That's OK; I just think the blurb
> is confusing.
> 

I was trying to figure out why setting iommu_platform fails to detect the HDD
in OVMF. Since SEV IOMMU work was still fresh in my mind hence I thought we
simply need to update the Virtio drivers to consume IOMMU protocol directly.
But your explanation on how things work makes sense; thanks for explaining
it in great detail.

I will follow your recommendation, and look into extending VIRTIO_DEVICE_PROTOCOL
with necessary functions and delegate the work to EFI_PCI_IO_PROTOCOL (which will
use IOMMU driver if available).

Thanks
Brijesh

> 
> (5) Now I'm coming to my main point. The virtio device drivers in
> OvmfPkg are UEFI_DRIVER modules that conform to the UEFI driver model.
> They consume our home-grown the VIRTIO_DEVICE_PROTOCOL, and produce
> whatever UEFI protocol is appropriate on top (for example,
> EFI_BLOCK_IO_PROTOCOL in VirtioBlkDxe).
> 
> Such a driver has no business talking to the platform's IOMMU protocol
> directly (even if there is one). Instead:
> 
> 
> (5.1) The VIRTIO_DEVICE_PROTOCOL has to be extended with the necessary
> 
>    AllocateSharedPages, FreeSharedPages, MapSharedPages, UnmapSharedPages
> 
> functions.
> 
> 
> (5.2) All top-level virtio driver code, including VirtioLib, has to be
> rebased to the new VIRTIO_DEVICE_PROTOCOL member functions. This covers
> the ring memory itself, the memory pointed-to by the descriptors placed
> on the ring(s) -- i.e., the device specific requests --, and all further
> memory that is referenced by those device specific requests.
> 
> This will result in a larger memory footprint, as all current pool
> allocations will be turned into page allocations, but I guess that is
> tolerable.
> 
> 
> (5.3) All virtio driver code should treat VIRTIO_F_IOMMU_PLATFORM simply
> in parallel with VIRTIO_F_VERSION_1, and don't act upon
> VIRTIO_F_IOMMU_PLATFORM in any special shape or form. So basically this
> is just my point (3) from above.
> 
> 
> (5.4) There are three VIRTIO_DEVICE_PROTOCOL implementations in edk2:
> 
> - "OvmfPkg/VirtioPciDeviceDxe" binds legacy-only and transitional
>    virtio-pci devices, and offers virtio 0.9.5 semantics.
> 
> - "ArmVirtPkg/VirtioFdtDxe" (via "OvmfPkg/Library/VirtioMmioDeviceLib")
>    binds virtio-mmio devices, and offers virtio 0.9.5 semantics.
> 
> - "OvmfPkg/Virtio10Dxe" binds modern-only virtio-pci devices, and offers
>    virtio 1.0.0 semantics.
> 
> The first two drivers should implement the AllocateSharedPages() and
> FreeSharedPages() member functions simply with the corresponding
> MemoryAllocationLib functions (using BootServicesData type memory), and
> implement the MapSharedPages() and UnmapSharedPages() member functions
> as no-ops (return the input addresses transparently).
> 
> The third driver should implement all four new member functions by
> respectively delegating the job to:
> - EFI_PCI_IO_PROTOCOL.AllocateBuffer() -- with BootServicesData --
> - EFI_PCI_IO_PROTOCOL.FreeBuffer()
> - EFI_PCI_IO_PROTOCOL.Map() -- with BusMasterCommonBuffer64 --
> - EFI_PCI_IO_PROTOCOL.Unmap()
> 
> The EFI_PCI_IO_PROTOCOL implementation will delegate these calls to the
> platform-specific PCI host bridge / root bridge driver, and *that*
> driver in turn is allowed to talk to an IOMMU protocol (if any).
> 
> (This last step is already covered by the following edk2 commits:
> - generally, c15da8eb3587 ("MdeModulePkg/PciHostBridge: Add IOMMU
> support.", 2017-04-29),
> - specifically for SEV in OVMF, c6ab9aecb71b ("OvmfPkg: update
> PciHostBridgeDxe to use PlatformHasIoMmuLib", 2017-07-06).)
> 
> 
> (5.5) There's a delicate question that has to be considered with care;
> namely the ExitBootServices() callbacks in the virtio drivers.
> 
> Currently these functions perform virtio resets. They cause the devices
> to forget all their configuration (including guest RAM references).
> Aborting in-flight DMA (e.g. in VirtioNetDxe) is a practice that is
> specifically recommended in the UEFI Driver Writers' Guide; plus at some
> point the guest kernel will reclaim and overwrite BootServicesData type
> memory. If at that point QEMU is still looking at some (originally
> firmware-allocated) areas as virtio rings, the results won't be amusing.
> (Speaking from experience.) Hence the resetting of virtio devices upon
> ExitBootServices().
> 
> Now, remember that ExitBootServices() callbacks *must not* change the
> UEFI memory map, so *no* memory allocations *must* be released in the
> callbacks. The virtio resets performed in the callbacks are surgical for
> this reason; nothing else is being done. The memory will be reclaimed by
> the OS, later.
> 
> With an IOMMU in the picture, further actions become necessary: *after*
> the virtio reset, any buffers previously in use (including rings, device
> specific requests pointed-to by descriptors, and any further memory
> referenced by those requests) must be *unmapped*, but *not freed*.
> 
> (Speaking in SEV terms, this will result in those memory areas seeing
> their C bits restored, without changing the UEFI memmap.)
> 
> This means that:
> - the new VIRTIO_DEVICE_PROTOCOL.UnmapSharedPages() function has to be
> called judiciously from these callbacks, after the virtio reset,
> - *and* that the *entire* call chain originating from
> UnmapSharedPages(), through PciIo, through PciRootBridgeIo, to
> EdkiiIoMmu, *must* not call gBS->FreePool() or gBS->FreePages() (or the
> equivalent MemoryAllocationLib functions).
> 
> Note that if the last requirement is currently violated (outside of
> OvmfPkg), then that is a general problem for physical platforms as well
> -- IMO, a physical NIC driver too should be able to abort DMA in its
> exit-boot-services callback and then unmap any relevant IOMMU mappings
> (via PciIo->Unmap().)
> 
> Thanks
> Laszlo
> 
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Brijesh Singh 7 years, 5 months ago
Hi Laszlo,


> 
> (5.3) All virtio driver code should treat VIRTIO_F_IOMMU_PLATFORM simply
> in parallel with VIRTIO_F_VERSION_1, and don't act upon
> VIRTIO_F_IOMMU_PLATFORM in any special shape or form. So basically this
> is just my point (3) from above.
> 
> 
> (5.4) There are three VIRTIO_DEVICE_PROTOCOL implementations in edk2:
> 
> - "OvmfPkg/VirtioPciDeviceDxe" binds legacy-only and transitional
>    virtio-pci devices, and offers virtio 0.9.5 semantics.
> 
> - "ArmVirtPkg/VirtioFdtDxe" (via "OvmfPkg/Library/VirtioMmioDeviceLib")
>    binds virtio-mmio devices, and offers virtio 0.9.5 semantics.
> 
> - "OvmfPkg/Virtio10Dxe" binds modern-only virtio-pci devices, and offers
>    virtio 1.0.0 semantics.
> 
> The first two drivers should implement the AllocateSharedPages() and
> FreeSharedPages() member functions simply with the corresponding
> MemoryAllocationLib functions (using BootServicesData type memory), and
> implement the MapSharedPages() and UnmapSharedPages() member functions
> as no-ops (return the input addresses transparently).
> 
> The third driver should implement all four new member functions by
> respectively delegating the job to:
> - EFI_PCI_IO_PROTOCOL.AllocateBuffer() -- with BootServicesData --
> - EFI_PCI_IO_PROTOCOL.FreeBuffer()
> - EFI_PCI_IO_PROTOCOL.Map() -- with BusMasterCommonBuffer64 --
> - EFI_PCI_IO_PROTOCOL.Unmap()
> 

I have working to implement patch per your recommendation. I assume you mean
map the buffers with EfiPciIoOperationBusMasterCommonBuffer [1]. If so, I see
one issue with SEV guest. When SEV is active, IOMMU driver uses a bounce buffer
to map host address to a device address. While creating bounce buffer we can map
it either for EfiPciIoOperationBusMasterRead or EfiPciIoOperationBusMasterWrite
Operation. If caller wants to map EfiPciIoOperationBusMasterCommonBuffer then it
must allocate the buffer using EFI_PCI_IO_PROTOCOL.AllocateBuffer() [2] otherwise
we will fail to map. I see that PciRootBridgeIo.c has similar check when using
a bounce buffer for < 4GB use cases [3].

Do you see any issue if we use EfiPciIoOperationBusMasterRead or EfiPciIoOperationBusMasterWrite
instead of EfiPciIoOperationBusMasterCommonBuffer ?

[1] https://github.com/tianocore/edk2/blob/master/EdkCompatibilityPkg/Foundation/Efi/Protocol/PciIo/PciIo.h#L169
[2] https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciBusDxe/PciIo.c#L1082
[3] https://github.com/tianocore/edk2/blob/master/MdeModulePkg/Bus/Pci/PciHostBridgeDxe/PciRootBridgeIo.c#L1109

> The EFI_PCI_IO_PROTOCOL implementation will delegate these calls to the
> platform-specific PCI host bridge / root bridge driver, and *that*
> driver in turn is allowed to talk to an IOMMU protocol (if any).
> 
> (This last step is already covered by the following edk2 commits:
> - generally, c15da8eb3587 ("MdeModulePkg/PciHostBridge: Add IOMMU
> support.", 2017-04-29),
> - specifically for SEV in OVMF, c6ab9aecb71b ("OvmfPkg: update
> PciHostBridgeDxe to use PlatformHasIoMmuLib", 2017-07-06).)
> 
> 

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Laszlo Ersek 7 years, 5 months ago
On 07/27/17 16:21, Brijesh Singh wrote:
> Hi Laszlo,
>
>
>>
>> (5.3) All virtio driver code should treat VIRTIO_F_IOMMU_PLATFORM
>> simply in parallel with VIRTIO_F_VERSION_1, and don't act upon
>> VIRTIO_F_IOMMU_PLATFORM in any special shape or form. So basically
>> this is just my point (3) from above.
>>
>>
>> (5.4) There are three VIRTIO_DEVICE_PROTOCOL implementations in edk2:
>>
>> - "OvmfPkg/VirtioPciDeviceDxe" binds legacy-only and transitional
>>    virtio-pci devices, and offers virtio 0.9.5 semantics.
>>
>> - "ArmVirtPkg/VirtioFdtDxe" (via
>>    "OvmfPkg/Library/VirtioMmioDeviceLib") binds virtio-mmio devices,
>>    and offers virtio 0.9.5 semantics.
>>
>> - "OvmfPkg/Virtio10Dxe" binds modern-only virtio-pci devices, and
>>    offers virtio 1.0.0 semantics.
>>
>> The first two drivers should implement the AllocateSharedPages() and
>> FreeSharedPages() member functions simply with the corresponding
>> MemoryAllocationLib functions (using BootServicesData type memory),
>> and implement the MapSharedPages() and UnmapSharedPages() member
>> functions as no-ops (return the input addresses transparently).
>>
>> The third driver should implement all four new member functions by
>> respectively delegating the job to:
>> - EFI_PCI_IO_PROTOCOL.AllocateBuffer() -- with BootServicesData --
>> - EFI_PCI_IO_PROTOCOL.FreeBuffer()
>> - EFI_PCI_IO_PROTOCOL.Map() -- with BusMasterCommonBuffer64 --
>> - EFI_PCI_IO_PROTOCOL.Unmap()
>>
>
> I have working to implement patch per your recommendation. I assume
> you mean map the buffers with EfiPciIoOperationBusMasterCommonBuffer
> [1].

Yes.

> If so, I see one issue with SEV guest. When SEV is active, IOMMU
> driver uses a bounce buffer to map host address to a device address.
> While creating bounce buffer we can map it either for
> EfiPciIoOperationBusMasterRead or EfiPciIoOperationBusMasterWrite
> Operation. If caller wants to map
> EfiPciIoOperationBusMasterCommonBuffer then it must allocate the
> buffer using EFI_PCI_IO_PROTOCOL.AllocateBuffer() [2] otherwise we
> will fail to map.

Correct.

> I see that PciRootBridgeIo.c has similar check when using a bounce
> buffer for < 4GB use cases [3].

Correct.

> Do you see any issue if we use EfiPciIoOperationBusMasterRead or
> EfiPciIoOperationBusMasterWrite instead of
> EfiPciIoOperationBusMasterCommonBuffer ?

Yes, I do. The BusMasterRead and BusMasterWrite operations are suitable
for one-shot, uni-directional transfers, where the bounce buffer
contents and the client code contents are exchanged on Map/Unmap.

However virtio transfers, generally speaking, are not like this. While
the individual memory areas pointed-to by separate virtio descriptors
can me associated with one specific transfer direction (see the
VRING_DESC_F_WRITE flag), the virtio ring area itself is long-lived, and
it is simultaneously read and written by both host and guest,
asynchronously and repeatedly. This calls for BusMasterCommonBuffer.
Once we implement BusMasterCommonBuffer, we can use it for everything
else.


Another reason why separate allocation and mapping (and conversely,
separate unmapping and deallocation) are required is the
ExitBootServices() callbacks. In that callback, unmapping must happen
*without* memory allocation or deallocation (because the IOMMU must be
de-programmed, but the UEFI memmap must not be changed), covering all
memory areas that are at that point shared between host and guest
(regardless of transfer direction).

Normally, Map allocates the bounce buffer internally, and Unmap releases
the bounce buffer internally (for BusMasterRead and BusMasterWrite),
which is not right here. If you use AllocateBuffer() separately, then
Map() -- with BusMasterCommonBuffer -- will not allocate internally, and
Unmap() will also not deallocate internally. So, in the
ExitBootServices() callback, you will be able to call Unmap() only, and
then *not* call FreeBuffer() at all.

This is why I suggest introducing all four functions (Allocate, Free,
Map, Unmap) to the VIRTIO_DEVICE_PROTOCOL, and why all virtio drivers
should use all four functions explicitly, not just Map and Unmap.

... Actually, I think there is a bug in
"OvmfPkg/IoMmuDxe/AmdSevIoMmu.c". You have the following distribution of
operations at the moment:

- IoMmuAllocateBuffer() allocates pages and clears the memory encryption
  mask

- IoMmuFreeBuffer() re-sets the memory encryption mask, and deallocates
  pages

- IoMmuMap() does nothing at all when BusMasterCommonBuffer is requested
  (and IoMmuAllocateBuffer() was called previously). Otherwise,
  IoMmuMap() allocates pages, allocates MAP_INFO, and clears the memory
  encryption mask.

- IoMmuUnmap() does nothing when cleaning up a BusMasterCommonBuffer
  operation (--> NO_MAPPING). Otherwise, IoMmuUnmap() clears the
  encryption mask, and frees both the pages and MAP_INFO.

This distribution of operations seems wrong. The key point is that
AllocateBuffer() *need not* result in a buffer that is immediately
usable, and that client code is required to call Map()
*unconditionally*, even if BusMasterCommonBuffer is the desired
operation. Therefore, the right distribution of operations is:

- IoMmuAllocateBuffer() allocates pages and does not touch the
  encryption mask..

- IoMmuFreeBuffer() deallocates pages and does not touch the encryption
  mask.

- IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
  requested, and it allocates pages (bounce buffer) otherwise.

  *Regardless* of BusMaster operation, the following actions are carried
  out unconditionally:

  . the memory encryption mask is cleared in this function (and in this
    function only),

  . An attempt is made to grab a MAP_INFO structure from an internal
    free list (to be introduced!). The head of the list is a new static
    variable. If the free list is empty, then a MAP_INFO structure is
    allocated with AllocatePool(). The NO_MAPPING macro becomes unused
    and can be deleted from the source code.

- IoMmuUnmap() clears the encryption mask unconditionally. (For this, it
  has to consult the MAP_INFO structure that is being passed in from the
  caller.) In addition:

  . If MapInfo->Operation is BusMasterCommonBuffer, then we know the
    allocation was done separately in AllocateBuffer, so we do not
    release the pages. Otherwise, we do release the pages.

  . MapInfo is linked back on the internal free list (see above). It is
    *never* released with FreePool().

  This approach guarantees that IoMmuUnmap() can de-program the IOMMU (=
  re-set the memory encryption mask) without changing the UEFI memory
  map. (I trust that MemEncryptSevSetPageEncMask() will not split page
  tables internally when it *re*sets the encryption mask -- is that
  correct?)

I'm sorry that I didn't catch this earlier in your commit f9d129e68a45
("OvmfPkg: Add IoMmuDxe driver", 2017-07-06), but as you see, I gave you
only an Acked-by on that, not a Reviewed-by. I've really had to think it
through from the virtio perspective; I didn't foresee this use case back
then (I only felt that I wasn't getting the full picture about the IOMMU
protocol details).

Thanks
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Ard Biesheuvel 7 years, 5 months ago
On 27 July 2017 at 18:16, Laszlo Ersek <lersek@redhat.com> wrote:
> On 07/27/17 16:21, Brijesh Singh wrote:
>> Hi Laszlo,
>>
>>
>>>
>>> (5.3) All virtio driver code should treat VIRTIO_F_IOMMU_PLATFORM
>>> simply in parallel with VIRTIO_F_VERSION_1, and don't act upon
>>> VIRTIO_F_IOMMU_PLATFORM in any special shape or form. So basically
>>> this is just my point (3) from above.
>>>
>>>
>>> (5.4) There are three VIRTIO_DEVICE_PROTOCOL implementations in edk2:
>>>
>>> - "OvmfPkg/VirtioPciDeviceDxe" binds legacy-only and transitional
>>>    virtio-pci devices, and offers virtio 0.9.5 semantics.
>>>
>>> - "ArmVirtPkg/VirtioFdtDxe" (via
>>>    "OvmfPkg/Library/VirtioMmioDeviceLib") binds virtio-mmio devices,
>>>    and offers virtio 0.9.5 semantics.
>>>
>>> - "OvmfPkg/Virtio10Dxe" binds modern-only virtio-pci devices, and
>>>    offers virtio 1.0.0 semantics.
>>>
>>> The first two drivers should implement the AllocateSharedPages() and
>>> FreeSharedPages() member functions simply with the corresponding
>>> MemoryAllocationLib functions (using BootServicesData type memory),
>>> and implement the MapSharedPages() and UnmapSharedPages() member
>>> functions as no-ops (return the input addresses transparently).
>>>
>>> The third driver should implement all four new member functions by
>>> respectively delegating the job to:
>>> - EFI_PCI_IO_PROTOCOL.AllocateBuffer() -- with BootServicesData --
>>> - EFI_PCI_IO_PROTOCOL.FreeBuffer()
>>> - EFI_PCI_IO_PROTOCOL.Map() -- with BusMasterCommonBuffer64 --
>>> - EFI_PCI_IO_PROTOCOL.Unmap()
>>>
>>
>> I have working to implement patch per your recommendation. I assume
>> you mean map the buffers with EfiPciIoOperationBusMasterCommonBuffer
>> [1].
>
> Yes.
>
>> If so, I see one issue with SEV guest. When SEV is active, IOMMU
>> driver uses a bounce buffer to map host address to a device address.
>> While creating bounce buffer we can map it either for
>> EfiPciIoOperationBusMasterRead or EfiPciIoOperationBusMasterWrite
>> Operation. If caller wants to map
>> EfiPciIoOperationBusMasterCommonBuffer then it must allocate the
>> buffer using EFI_PCI_IO_PROTOCOL.AllocateBuffer() [2] otherwise we
>> will fail to map.
>
> Correct.
>
>> I see that PciRootBridgeIo.c has similar check when using a bounce
>> buffer for < 4GB use cases [3].
>
> Correct.
>
>> Do you see any issue if we use EfiPciIoOperationBusMasterRead or
>> EfiPciIoOperationBusMasterWrite instead of
>> EfiPciIoOperationBusMasterCommonBuffer ?
>
> Yes, I do. The BusMasterRead and BusMasterWrite operations are suitable
> for one-shot, uni-directional transfers, where the bounce buffer
> contents and the client code contents are exchanged on Map/Unmap.
>
> However virtio transfers, generally speaking, are not like this. While
> the individual memory areas pointed-to by separate virtio descriptors
> can me associated with one specific transfer direction (see the
> VRING_DESC_F_WRITE flag), the virtio ring area itself is long-lived, and
> it is simultaneously read and written by both host and guest,
> asynchronously and repeatedly. This calls for BusMasterCommonBuffer.
> Once we implement BusMasterCommonBuffer, we can use it for everything
> else.
>
>
> Another reason why separate allocation and mapping (and conversely,
> separate unmapping and deallocation) are required is the
> ExitBootServices() callbacks. In that callback, unmapping must happen
> *without* memory allocation or deallocation (because the IOMMU must be
> de-programmed, but the UEFI memmap must not be changed), covering all
> memory areas that are at that point shared between host and guest
> (regardless of transfer direction).
>
> Normally, Map allocates the bounce buffer internally, and Unmap releases
> the bounce buffer internally (for BusMasterRead and BusMasterWrite),
> which is not right here. If you use AllocateBuffer() separately, then
> Map() -- with BusMasterCommonBuffer -- will not allocate internally, and
> Unmap() will also not deallocate internally. So, in the
> ExitBootServices() callback, you will be able to call Unmap() only, and
> then *not* call FreeBuffer() at all.
>
> This is why I suggest introducing all four functions (Allocate, Free,
> Map, Unmap) to the VIRTIO_DEVICE_PROTOCOL, and why all virtio drivers
> should use all four functions explicitly, not just Map and Unmap.
>
> ... Actually, I think there is a bug in
> "OvmfPkg/IoMmuDxe/AmdSevIoMmu.c". You have the following distribution of
> operations at the moment:
>
> - IoMmuAllocateBuffer() allocates pages and clears the memory encryption
>   mask
>
> - IoMmuFreeBuffer() re-sets the memory encryption mask, and deallocates
>   pages
>
> - IoMmuMap() does nothing at all when BusMasterCommonBuffer is requested
>   (and IoMmuAllocateBuffer() was called previously). Otherwise,
>   IoMmuMap() allocates pages, allocates MAP_INFO, and clears the memory
>   encryption mask.
>
> - IoMmuUnmap() does nothing when cleaning up a BusMasterCommonBuffer
>   operation (--> NO_MAPPING). Otherwise, IoMmuUnmap() clears the
>   encryption mask, and frees both the pages and MAP_INFO.
>
> This distribution of operations seems wrong. The key point is that
> AllocateBuffer() *need not* result in a buffer that is immediately
> usable, and that client code is required to call Map()
> *unconditionally*, even if BusMasterCommonBuffer is the desired
> operation. Therefore, the right distribution of operations is:
>
> - IoMmuAllocateBuffer() allocates pages and does not touch the
>   encryption mask..
>
> - IoMmuFreeBuffer() deallocates pages and does not touch the encryption
>   mask.
>
> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>   requested, and it allocates pages (bounce buffer) otherwise.
>
>   *Regardless* of BusMaster operation, the following actions are carried
>   out unconditionally:
>
>   . the memory encryption mask is cleared in this function (and in this
>     function only),
>
>   . An attempt is made to grab a MAP_INFO structure from an internal
>     free list (to be introduced!). The head of the list is a new static
>     variable. If the free list is empty, then a MAP_INFO structure is
>     allocated with AllocatePool(). The NO_MAPPING macro becomes unused
>     and can be deleted from the source code.
>
> - IoMmuUnmap() clears the encryption mask unconditionally. (For this, it
>   has to consult the MAP_INFO structure that is being passed in from the
>   caller.) In addition:
>
>   . If MapInfo->Operation is BusMasterCommonBuffer, then we know the
>     allocation was done separately in AllocateBuffer, so we do not
>     release the pages. Otherwise, we do release the pages.
>
>   . MapInfo is linked back on the internal free list (see above). It is
>     *never* released with FreePool().
>
>   This approach guarantees that IoMmuUnmap() can de-program the IOMMU (=
>   re-set the memory encryption mask) without changing the UEFI memory
>   map. (I trust that MemEncryptSevSetPageEncMask() will not split page
>   tables internally when it *re*sets the encryption mask -- is that
>   correct?)
>
> I'm sorry that I didn't catch this earlier in your commit f9d129e68a45
> ("OvmfPkg: Add IoMmuDxe driver", 2017-07-06), but as you see, I gave you
> only an Acked-by on that, not a Reviewed-by. I've really had to think it
> through from the virtio perspective; I didn't foresee this use case back
> then (I only felt that I wasn't getting the full picture about the IOMMU
> protocol details).
>

I have to concur with Laszlo here: the respective semantics of the
allocate/map/unmap/free operations should be identical to their
counterparts in the PCI I/O protocol, and in the DmaLib in
EmbeddedPkg.

Note that this is likely to cause problems with proprietary x86
drivers in option ROMs, which are used to getting away with using host
addresses for DMA, and fail to invoke Map() for common buffers (or
fail to invoke AllocateBuffer() altogether). The API is clear, and
drivers that abide by the rules should operate correctly even when
using encrypted memory or non-1:1 mapping between the host and PCI
address space (which is how I ran into these issues)
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Brijesh Singh 7 years, 5 months ago

On 07/27/2017 12:56 PM, Ard Biesheuvel wrote:
> On 27 July 2017 at 18:16, Laszlo Ersek <lersek@redhat.com> wrote:
>> On 07/27/17 16:21, Brijesh Singh wrote:
>>> Hi Laszlo,
>>>
>>>
>>>>
>>>> (5.3) All virtio driver code should treat VIRTIO_F_IOMMU_PLATFORM
>>>> simply in parallel with VIRTIO_F_VERSION_1, and don't act upon
>>>> VIRTIO_F_IOMMU_PLATFORM in any special shape or form. So basically
>>>> this is just my point (3) from above.
>>>>
>>>>
>>>> (5.4) There are three VIRTIO_DEVICE_PROTOCOL implementations in edk2:
>>>>
>>>> - "OvmfPkg/VirtioPciDeviceDxe" binds legacy-only and transitional
>>>>     virtio-pci devices, and offers virtio 0.9.5 semantics.
>>>>
>>>> - "ArmVirtPkg/VirtioFdtDxe" (via
>>>>     "OvmfPkg/Library/VirtioMmioDeviceLib") binds virtio-mmio devices,
>>>>     and offers virtio 0.9.5 semantics.
>>>>
>>>> - "OvmfPkg/Virtio10Dxe" binds modern-only virtio-pci devices, and
>>>>     offers virtio 1.0.0 semantics.
>>>>
>>>> The first two drivers should implement the AllocateSharedPages() and
>>>> FreeSharedPages() member functions simply with the corresponding
>>>> MemoryAllocationLib functions (using BootServicesData type memory),
>>>> and implement the MapSharedPages() and UnmapSharedPages() member
>>>> functions as no-ops (return the input addresses transparently).
>>>>
>>>> The third driver should implement all four new member functions by
>>>> respectively delegating the job to:
>>>> - EFI_PCI_IO_PROTOCOL.AllocateBuffer() -- with BootServicesData --
>>>> - EFI_PCI_IO_PROTOCOL.FreeBuffer()
>>>> - EFI_PCI_IO_PROTOCOL.Map() -- with BusMasterCommonBuffer64 --
>>>> - EFI_PCI_IO_PROTOCOL.Unmap()
>>>>
>>>
>>> I have working to implement patch per your recommendation. I assume
>>> you mean map the buffers with EfiPciIoOperationBusMasterCommonBuffer
>>> [1].
>>
>> Yes.
>>
>>> If so, I see one issue with SEV guest. When SEV is active, IOMMU
>>> driver uses a bounce buffer to map host address to a device address.
>>> While creating bounce buffer we can map it either for
>>> EfiPciIoOperationBusMasterRead or EfiPciIoOperationBusMasterWrite
>>> Operation. If caller wants to map
>>> EfiPciIoOperationBusMasterCommonBuffer then it must allocate the
>>> buffer using EFI_PCI_IO_PROTOCOL.AllocateBuffer() [2] otherwise we
>>> will fail to map.
>>
>> Correct.
>>
>>> I see that PciRootBridgeIo.c has similar check when using a bounce
>>> buffer for < 4GB use cases [3].
>>
>> Correct.
>>
>>> Do you see any issue if we use EfiPciIoOperationBusMasterRead or
>>> EfiPciIoOperationBusMasterWrite instead of
>>> EfiPciIoOperationBusMasterCommonBuffer ?
>>
>> Yes, I do. The BusMasterRead and BusMasterWrite operations are suitable
>> for one-shot, uni-directional transfers, where the bounce buffer
>> contents and the client code contents are exchanged on Map/Unmap.
>>
>> However virtio transfers, generally speaking, are not like this. While
>> the individual memory areas pointed-to by separate virtio descriptors
>> can me associated with one specific transfer direction (see the
>> VRING_DESC_F_WRITE flag), the virtio ring area itself is long-lived, and
>> it is simultaneously read and written by both host and guest,
>> asynchronously and repeatedly. This calls for BusMasterCommonBuffer.
>> Once we implement BusMasterCommonBuffer, we can use it for everything
>> else.
>>
>>
>> Another reason why separate allocation and mapping (and conversely,
>> separate unmapping and deallocation) are required is the
>> ExitBootServices() callbacks. In that callback, unmapping must happen
>> *without* memory allocation or deallocation (because the IOMMU must be
>> de-programmed, but the UEFI memmap must not be changed), covering all
>> memory areas that are at that point shared between host and guest
>> (regardless of transfer direction).
>>
>> Normally, Map allocates the bounce buffer internally, and Unmap releases
>> the bounce buffer internally (for BusMasterRead and BusMasterWrite),
>> which is not right here. If you use AllocateBuffer() separately, then
>> Map() -- with BusMasterCommonBuffer -- will not allocate internally, and
>> Unmap() will also not deallocate internally. So, in the
>> ExitBootServices() callback, you will be able to call Unmap() only, and
>> then *not* call FreeBuffer() at all.
>>
>> This is why I suggest introducing all four functions (Allocate, Free,
>> Map, Unmap) to the VIRTIO_DEVICE_PROTOCOL, and why all virtio drivers
>> should use all four functions explicitly, not just Map and Unmap.
>>
>> ... Actually, I think there is a bug in
>> "OvmfPkg/IoMmuDxe/AmdSevIoMmu.c". You have the following distribution of
>> operations at the moment:
>>
>> - IoMmuAllocateBuffer() allocates pages and clears the memory encryption
>>    mask
>>
>> - IoMmuFreeBuffer() re-sets the memory encryption mask, and deallocates
>>    pages
>>
>> - IoMmuMap() does nothing at all when BusMasterCommonBuffer is requested
>>    (and IoMmuAllocateBuffer() was called previously). Otherwise,
>>    IoMmuMap() allocates pages, allocates MAP_INFO, and clears the memory
>>    encryption mask.
>>
>> - IoMmuUnmap() does nothing when cleaning up a BusMasterCommonBuffer
>>    operation (--> NO_MAPPING). Otherwise, IoMmuUnmap() clears the
>>    encryption mask, and frees both the pages and MAP_INFO.
>>

I agree with you, there is a bug in AmdSevIoMmu.c. I will fix it. And introduce
list to track if the buffer was allocated by us. If buffer was allocated by
us then we can safely say that it does not require a bounce buffer. While
implementing the initial code I was not able to see BusMasterCommonBuffer
usage. But with virtio things are getting a bit more clear in my mind.

>> This distribution of operations seems wrong. The key point is that
>> AllocateBuffer() *need not* result in a buffer that is immediately
>> usable, and that client code is required to call Map()
>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>> operation. Therefore, the right distribution of operations is:
>>
>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>    encryption mask..
>>
>> - IoMmuFreeBuffer() deallocates pages and does not touch the encryption
>>    mask.
>>

Actually one of main reason why we cleared and restored the memory encryption mask
during allocate/free is because we also consume the IOMMU protocol in QemuFwCfgLib
as a method to allocate and free a DMA buffer. I am certainly open to suggestions.

[1] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
[2] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197

>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>    requested, and it allocates pages (bounce buffer) otherwise.
>>

I am trying to wrap my head around how we can support BusMasterCommonBuffer
when buffer was not allocated by us. Changing the memory encryption mask in
a page table will not update the contents. Also since the memory encryption
mask works on PAGE_SIZE hence changing the encryption mask on not our allocated
buffer could mess things up (e.g if NumberOfBytes is not PAGE_SIZE aligned).

>>    *Regardless* of BusMaster operation, the following actions are carried
>>    out unconditionally:
>>
>>    . the memory encryption mask is cleared in this function (and in this
>>      function only),
>>
>>    . An attempt is made to grab a MAP_INFO structure from an internal
>>      free list (to be introduced!). The head of the list is a new static
>>      variable. If the free list is empty, then a MAP_INFO structure is
>>      allocated with AllocatePool(). The NO_MAPPING macro becomes unused
>>      and can be deleted from the source code.
>>
>> - IoMmuUnmap() clears the encryption mask unconditionally. (For this, it
>>    has to consult the MAP_INFO structure that is being passed in from the
>>    caller.) In addition:
>>
>>    . If MapInfo->Operation is BusMasterCommonBuffer, then we know the
>>      allocation was done separately in AllocateBuffer, so we do not
>>      release the pages. Otherwise, we do release the pages.
>>
>>    . MapInfo is linked back on the internal free list (see above). It is
>>      *never* released with FreePool().
>>
>>    This approach guarantees that IoMmuUnmap() can de-program the IOMMU (=
>>    re-set the memory encryption mask) without changing the UEFI memory
>>    map. (I trust that MemEncryptSevSetPageEncMask() will not split page
>>    tables internally when it *re*sets the encryption mask -- is that
>>    correct?)

Yes, MemEncryptSevSetPageEncMask() will not split pages when it restores mask.

>>>> I'm sorry that I didn't catch this earlier in your commit f9d129e68a45
>> ("OvmfPkg: Add IoMmuDxe driver", 2017-07-06), but as you see, I gave you
>> only an Acked-by on that, not a Reviewed-by. I've really had to think it
>> through from the virtio perspective; I didn't foresee this use case back
>> then (I only felt that I wasn't getting the full picture about the IOMMU
>> protocol details).
>>

> 
> I have to concur with Laszlo here: the respective semantics of the
> allocate/map/unmap/free operations should be identical to their
> counterparts in the PCI I/O protocol, and in the DmaLib in
> EmbeddedPkg.
> 
> Note that this is likely to cause problems with proprietary x86
> drivers in option ROMs, which are used to getting away with using host
> addresses for DMA, and fail to invoke Map() for common buffers (or
> fail to invoke AllocateBuffer() altogether). The API is clear, and
> drivers that abide by the rules should operate correctly even when
> using encrypted memory or non-1:1 mapping between the host and PCI
> address space (which is how I ran into these issues)
> 
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Brijesh Singh 7 years, 5 months ago

On 07/27/2017 02:00 PM, Brijesh Singh wrote:

>>> This distribution of operations seems wrong. The key point is that
>>> AllocateBuffer() *need not* result in a buffer that is immediately
>>> usable, and that client code is required to call Map()
>>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>>> operation. Therefore, the right distribution of operations is:
>>>
>>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>>    encryption mask..
>>>
>>> - IoMmuFreeBuffer() deallocates pages and does not touch the encryption
>>>    mask.
>>>
> 
> Actually one of main reason why we cleared and restored the memory encryption mask
> during allocate/free is because we also consume the IOMMU protocol in QemuFwCfgLib
> as a method to allocate and free a DMA buffer. I am certainly open to suggestions.
> 
> [1] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
> [2] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197
> 
>>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>>    requested, and it allocates pages (bounce buffer) otherwise.
>>>
> 
> I am trying to wrap my head around how we can support BusMasterCommonBuffer
> when buffer was not allocated by us. Changing the memory encryption mask in
> a page table will not update the contents. Also since the memory encryption
> mask works on PAGE_SIZE hence changing the encryption mask on not our allocated
> buffer could mess things up (e.g if NumberOfBytes is not PAGE_SIZE aligned).
> 

I may be missing something in my understanding. Here is a flow I have in my
mind, please correct me.

OvmfPkg/VirtIoBlk.c:

VirtioBlkInit()
   ....
   ....
   VirtioRingInit
     Virtio->AllocateSharedPages(RingSize, &Ring->Base)
       PciIo->AllocatePages(RingSize, &RingAddress)
     Virtio->MapSharedPages(...,BusMasterCommonBuffer, Ring->Base, RingSize, &RingDeviceAddress)
     .....
     .....

This case is straight forward and we can easily maps. No need for bounce buffering.

VirtioBlkReadBlocks(..., BufferSize, Buffer,)
   ......
   ......
   SynchronousRequest(..., BufferSize, Buffer)
     ....
     Virtio->MapSharedPages(..., BusMasterCommonBuffer, Buffer, BufferSize, &DeviceAddress)
     VirtioAppendDesc(DeviceAddress, BufferSize, ...)
     VirtioFlush (...)
     

In the above case, "Buffer" was not allocated by us hence we will not able to change the
memory encryption attributes. Am I missing something in the flow ?


>>>    *Regardless* of BusMaster operation, the following actions are carried
>>>    out unconditionally:
>>>
>>>    . the memory encryption mask is cleared in this function (and in this
>>>      function only),
>>>
>>>    . An attempt is made to grab a MAP_INFO structure from an internal
>>>      free list (to be introduced!). The head of the list is a new static
>>>      variable. If the free list is empty, then a MAP_INFO structure is
>>>      allocated with AllocatePool(). The NO_MAPPING macro becomes unused
>>>      and can be deleted from the source code.
>>>
>>> - IoMmuUnmap() clears the encryption mask unconditionally. (For this, it
>>>    has to consult the MAP_INFO structure that is being passed in from the
>>>    caller.) In addition:
>>>
>>>    . If MapInfo->Operation is BusMasterCommonBuffer, then we know the
>>>      allocation was done separately in AllocateBuffer, so we do not
>>>      release the pages. Otherwise, we do release the pages.
>>>
>>>    . MapInfo is linked back on the internal free list (see above). It is
>>>      *never* released with FreePool().
>>>
>>>    This approach guarantees that IoMmuUnmap() can de-program the IOMMU (=
>>>    re-set the memory encryption mask) without changing the UEFI memory
>>>    map. (I trust that MemEncryptSevSetPageEncMask() will not split page
>>>    tables internally when it *re*sets the encryption mask -- is that
>>>    correct?)
> 



_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Ard Biesheuvel 7 years, 5 months ago
> On 27 Jul 2017, at 21:55, Brijesh Singh <brijesh.singh@amd.com> wrote:
> 
> 
> 
> On 07/27/2017 02:00 PM, Brijesh Singh wrote:
> 
>>>> This distribution of operations seems wrong. The key point is that
>>>> AllocateBuffer() *need not* result in a buffer that is immediately
>>>> usable, and that client code is required to call Map()
>>>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>>>> operation. Therefore, the right distribution of operations is:
>>>> 
>>>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>>>   encryption mask..
>>>> 
>>>> - IoMmuFreeBuffer() deallocates pages and does not touch the encryption
>>>>   mask.
>>>> 
>> Actually one of main reason why we cleared and restored the memory encryption mask
>> during allocate/free is because we also consume the IOMMU protocol in QemuFwCfgLib
>> as a method to allocate and free a DMA buffer. I am certainly open to suggestions.
>> [1] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
>> [2] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197
>>>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>>>   requested, and it allocates pages (bounce buffer) otherwise.
>>>> 
>> I am trying to wrap my head around how we can support BusMasterCommonBuffer
>> when buffer was not allocated by us. Changing the memory encryption mask in
>> a page table will not update the contents. Also since the memory encryption
>> mask works on PAGE_SIZE hence changing the encryption mask on not our allocated
>> buffer could mess things up (e.g if NumberOfBytes is not PAGE_SIZE aligned).
> 
> I may be missing something in my understanding. Here is a flow I have in my
> mind, please correct me.
> 
> OvmfPkg/VirtIoBlk.c:
> 
> VirtioBlkInit()
>  ....
>  ....
>  VirtioRingInit
>    Virtio->AllocateSharedPages(RingSize, &Ring->Base)
>      PciIo->AllocatePages(RingSize, &RingAddress)
>    Virtio->MapSharedPages(...,BusMasterCommonBuffer, Ring->Base, RingSize, &RingDeviceAddress)
>    .....
>    .....
> 
> This case is straight forward and we can easily maps. No need for bounce buffering.
> 
> VirtioBlkReadBlocks(..., BufferSize, Buffer,)
>  ......
>  ......
>  SynchronousRequest(..., BufferSize, Buffer)
>    ....
>    Virtio->MapSharedPages(..., BusMasterCommonBuffer, Buffer, BufferSize, &DeviceAddress)
>    VirtioAppendDesc(DeviceAddress, BufferSize, ...)
>    VirtioFlush (...)
>    
> In the above case, "Buffer" was not allocated by us hence we will not able to change the
> memory encryption attributes. Am I missing something in the flow ?
> 


Common buffer mappings may only be created from buffers that were allocated by AllocateBuffer(). In fact, that is its main purpose
> 
>>>>   *Regardless* of BusMaster operation, the following actions are carried
>>>>   out unconditionally:
>>>> 
>>>>   . the memory encryption mask is cleared in this function (and in this
>>>>     function only),
>>>> 
>>>>   . An attempt is made to grab a MAP_INFO structure from an internal
>>>>     free list (to be introduced!). The head of the list is a new static
>>>>     variable. If the free list is empty, then a MAP_INFO structure is
>>>>     allocated with AllocatePool(). The NO_MAPPING macro becomes unused
>>>>     and can be deleted from the source code.
>>>> 
>>>> - IoMmuUnmap() clears the encryption mask unconditionally. (For this, it
>>>>   has to consult the MAP_INFO structure that is being passed in from the
>>>>   caller.) In addition:
>>>> 
>>>>   . If MapInfo->Operation is BusMasterCommonBuffer, then we know the
>>>>     allocation was done separately in AllocateBuffer, so we do not
>>>>     release the pages. Otherwise, we do release the pages.
>>>> 
>>>>   . MapInfo is linked back on the internal free list (see above). It is
>>>>     *never* released with FreePool().
>>>> 
>>>>   This approach guarantees that IoMmuUnmap() can de-program the IOMMU (=
>>>>   re-set the memory encryption mask) without changing the UEFI memory
>>>>   map. (I trust that MemEncryptSevSetPageEncMask() will not split page
>>>>   tables internally when it *re*sets the encryption mask -- is that
>>>>   correct?)
> 
> 
> 
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Andrew Fish 7 years, 5 months ago
> On Jul 27, 2017, at 2:31 PM, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> 
>> 
>> On 27 Jul 2017, at 21:55, Brijesh Singh <brijesh.singh@amd.com> wrote:
>> 
>> 
>> 
>> On 07/27/2017 02:00 PM, Brijesh Singh wrote:
>> 
>>>>> This distribution of operations seems wrong. The key point is that
>>>>> AllocateBuffer() *need not* result in a buffer that is immediately
>>>>> usable, and that client code is required to call Map()
>>>>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>>>>> operation. Therefore, the right distribution of operations is:
>>>>> 
>>>>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>>>>  encryption mask..
>>>>> 
>>>>> - IoMmuFreeBuffer() deallocates pages and does not touch the encryption
>>>>>  mask.
>>>>> 
>>> Actually one of main reason why we cleared and restored the memory encryption mask
>>> during allocate/free is because we also consume the IOMMU protocol in QemuFwCfgLib
>>> as a method to allocate and free a DMA buffer. I am certainly open to suggestions.
>>> [1] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
>>> [2] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197
>>>>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>>>>  requested, and it allocates pages (bounce buffer) otherwise.
>>>>> 
>>> I am trying to wrap my head around how we can support BusMasterCommonBuffer
>>> when buffer was not allocated by us. Changing the memory encryption mask in
>>> a page table will not update the contents. Also since the memory encryption
>>> mask works on PAGE_SIZE hence changing the encryption mask on not our allocated
>>> buffer could mess things up (e.g if NumberOfBytes is not PAGE_SIZE aligned).
>> 
>> I may be missing something in my understanding. Here is a flow I have in my
>> mind, please correct me.
>> 
>> OvmfPkg/VirtIoBlk.c:
>> 
>> VirtioBlkInit()
>> ....
>> ....
>> VirtioRingInit
>>   Virtio->AllocateSharedPages(RingSize, &Ring->Base)
>>     PciIo->AllocatePages(RingSize, &RingAddress)
>>   Virtio->MapSharedPages(...,BusMasterCommonBuffer, Ring->Base, RingSize, &RingDeviceAddress)
>>   .....
>>   .....
>> 
>> This case is straight forward and we can easily maps. No need for bounce buffering.
>> 
>> VirtioBlkReadBlocks(..., BufferSize, Buffer,)
>> ......
>> ......
>> SynchronousRequest(..., BufferSize, Buffer)
>>   ....
>>   Virtio->MapSharedPages(..., BusMasterCommonBuffer, Buffer, BufferSize, &DeviceAddress)
>>   VirtioAppendDesc(DeviceAddress, BufferSize, ...)
>>   VirtioFlush (...)
>> 
>> In the above case, "Buffer" was not allocated by us hence we will not able to change the
>> memory encryption attributes. Am I missing something in the flow ?
>> 
> 
> 
> Common buffer mappings may only be created from buffers that were allocated by AllocateBuffer(). In fact, that is its main purpose

Brijesh,

If you look in the UEFI Spec 13.4 EFI PCI I/O Protocol there is a good write on DMA. 

DMA Bus Master Read Operation
=========================
Call Map() for EfiPciIoOperationBusMasterRead.
Program the DMA Bus Master with the DeviceAddress returned by Map(). Start the DMA Bus Master.
Wait for DMA Bus Master to complete the read operation.
Call Unmap().


DMA Bus Master Write Operation
==========================
Call Map() for EfiPciOperationBusMasterWrite.
Program the DMA Bus Master with the DeviceAddress returned by Map().
Start the DMA Bus Master.
Wait for DMA Bus Master to complete the write operation.
Perform a PCI controller specific read transaction to flush all PCI write buffers (See PCI Specification Section 3.2.5.2) .
Call Flush().
Call Unmap().

DMA Bus Master Common Buffer Operation
==================================
Call AllocateBuffer() to allocate a common buffer.
Call Map() for EfiPciIoOperationBusMasterCommonBuffer.
Program the DMA Bus Master with the DeviceAddress returned by Map().
The common buffer can now be accessed equally by the processor and the DMA bus master. Call Unmap().
Call FreeBuffer().

Thanks,

Andrew Fish


>> 
>>>>>  *Regardless* of BusMaster operation, the following actions are carried
>>>>>  out unconditionally:
>>>>> 
>>>>>  . the memory encryption mask is cleared in this function (and in this
>>>>>    function only),
>>>>> 
>>>>>  . An attempt is made to grab a MAP_INFO structure from an internal
>>>>>    free list (to be introduced!). The head of the list is a new static
>>>>>    variable. If the free list is empty, then a MAP_INFO structure is
>>>>>    allocated with AllocatePool(). The NO_MAPPING macro becomes unused
>>>>>    and can be deleted from the source code.
>>>>> 
>>>>> - IoMmuUnmap() clears the encryption mask unconditionally. (For this, it
>>>>>  has to consult the MAP_INFO structure that is being passed in from the
>>>>>  caller.) In addition:
>>>>> 
>>>>>  . If MapInfo->Operation is BusMasterCommonBuffer, then we know the
>>>>>    allocation was done separately in AllocateBuffer, so we do not
>>>>>    release the pages. Otherwise, we do release the pages.
>>>>> 
>>>>>  . MapInfo is linked back on the internal free list (see above). It is
>>>>>    *never* released with FreePool().
>>>>> 
>>>>>  This approach guarantees that IoMmuUnmap() can de-program the IOMMU (=
>>>>>  re-set the memory encryption mask) without changing the UEFI memory
>>>>>  map. (I trust that MemEncryptSevSetPageEncMask() will not split page
>>>>>  tables internally when it *re*sets the encryption mask -- is that
>>>>>  correct?)
>> 
>> 
>> 
> _______________________________________________
> edk2-devel mailing list
> edk2-devel@lists.01.org
> https://lists.01.org/mailman/listinfo/edk2-devel

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Brijesh Singh 7 years, 5 months ago
Hi Andrew,

On 07/27/2017 04:38 PM, Andrew Fish wrote:
> 
> Brijesh,
> 
> If you look in the UEFI Spec 13.4 EFI PCI I/O Protocol there is a good write on DMA.
> 
> DMA Bus Master Read Operation
> =========================
> Call Map() for EfiPciIoOperationBusMasterRead.
> Program the DMA Bus Master with the DeviceAddress returned by Map(). Start the DMA Bus Master.
> Wait for DMA Bus Master to complete the read operation.
> Call Unmap().
> 
> 
> DMA Bus Master Write Operation
> ==========================
> Call Map() for EfiPciOperationBusMasterWrite.
> Program the DMA Bus Master with the DeviceAddress returned by Map().
> Start the DMA Bus Master.
> Wait for DMA Bus Master to complete the write operation.
> Perform a PCI controller specific read transaction to flush all PCI write buffers (See PCI Specification Section 3.2.5.2) .
> Call Flush().
> Call Unmap().
> 
> DMA Bus Master Common Buffer Operation
> ==================================
> Call AllocateBuffer() to allocate a common buffer.
> Call Map() for EfiPciIoOperationBusMasterCommonBuffer.
> Program the DMA Bus Master with the DeviceAddress returned by Map().
> The common buffer can now be accessed equally by the processor and the DMA bus master. Call Unmap().
> Call FreeBuffer().
> 

Thanks for the link.

-Brijesh
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Brijesh Singh 7 years, 5 months ago

On 07/27/2017 04:31 PM, Ard Biesheuvel wrote:
> 
>> On 27 Jul 2017, at 21:55, Brijesh Singh <brijesh.singh@amd.com> wrote:
>>
>>
>>
>> On 07/27/2017 02:00 PM, Brijesh Singh wrote:
>>
>>>>> This distribution of operations seems wrong. The key point is that
>>>>> AllocateBuffer() *need not* result in a buffer that is immediately
>>>>> usable, and that client code is required to call Map()
>>>>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>>>>> operation. Therefore, the right distribution of operations is:
>>>>>
>>>>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>>>>    encryption mask..
>>>>>
>>>>> - IoMmuFreeBuffer() deallocates pages and does not touch the encryption
>>>>>    mask.
>>>>>
>>> Actually one of main reason why we cleared and restored the memory encryption mask
>>> during allocate/free is because we also consume the IOMMU protocol in QemuFwCfgLib
>>> as a method to allocate and free a DMA buffer. I am certainly open to suggestions.
>>> [1] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
>>> [2] https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197
>>>>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>>>>    requested, and it allocates pages (bounce buffer) otherwise.
>>>>>
>>> I am trying to wrap my head around how we can support BusMasterCommonBuffer
>>> when buffer was not allocated by us. Changing the memory encryption mask in
>>> a page table will not update the contents. Also since the memory encryption
>>> mask works on PAGE_SIZE hence changing the encryption mask on not our allocated
>>> buffer could mess things up (e.g if NumberOfBytes is not PAGE_SIZE aligned).
>>
>> I may be missing something in my understanding. Here is a flow I have in my
>> mind, please correct me.
>>
>> OvmfPkg/VirtIoBlk.c:
>>
>> VirtioBlkInit()
>>   ....
>>   ....
>>   VirtioRingInit
>>     Virtio->AllocateSharedPages(RingSize, &Ring->Base)
>>       PciIo->AllocatePages(RingSize, &RingAddress)
>>     Virtio->MapSharedPages(...,BusMasterCommonBuffer, Ring->Base, RingSize, &RingDeviceAddress)
>>     .....
>>     .....
>>
>> This case is straight forward and we can easily maps. No need for bounce buffering.
>>
>> VirtioBlkReadBlocks(..., BufferSize, Buffer,)
>>   ......
>>   ......
>>   SynchronousRequest(..., BufferSize, Buffer)
>>     ....
>>     Virtio->MapSharedPages(..., BusMasterCommonBuffer, Buffer, BufferSize, &DeviceAddress)
>>     VirtioAppendDesc(DeviceAddress, BufferSize, ...)
>>     VirtioFlush (...)
>>     
>> In the above case, "Buffer" was not allocated by us hence we will not able to change the
>> memory encryption attributes. Am I missing something in the flow ?
>>
> 
> 
> Common buffer mappings may only be created from buffers that were allocated by AllocateBuffer(). In fact, that is its main purpose

Yes, that part is well understood. If the buffer was allocated by us (e.g vring, request/status
structure etc) then those should be mapped as "BusMasterCommonBuffer".

But I am trying to figure out, how to map a data buffers before issuing a virtio request. e.g when
VirtioBlkReadBlocks() is called, "Buffer" pointer is not a DMA address hence we need to map it.
I think it should be mapped using "BusMasterWrite" not "BusMasterCommonBuffer" before adding into vring.


>>
>>>>>    *Regardless* of BusMaster operation, the following actions are carried
>>>>>    out unconditionally:
>>>>>
>>>>>    . the memory encryption mask is cleared in this function (and in this
>>>>>      function only),
>>>>>
>>>>>    . An attempt is made to grab a MAP_INFO structure from an internal
>>>>>      free list (to be introduced!). The head of the list is a new static
>>>>>      variable. If the free list is empty, then a MAP_INFO structure is
>>>>>      allocated with AllocatePool(). The NO_MAPPING macro becomes unused
>>>>>      and can be deleted from the source code.
>>>>>
>>>>> - IoMmuUnmap() clears the encryption mask unconditionally. (For this, it
>>>>>    has to consult the MAP_INFO structure that is being passed in from the
>>>>>    caller.) In addition:
>>>>>
>>>>>    . If MapInfo->Operation is BusMasterCommonBuffer, then we know the
>>>>>      allocation was done separately in AllocateBuffer, so we do not
>>>>>      release the pages. Otherwise, we do release the pages.
>>>>>
>>>>>    . MapInfo is linked back on the internal free list (see above). It is
>>>>>      *never* released with FreePool().
>>>>>
>>>>>    This approach guarantees that IoMmuUnmap() can de-program the IOMMU (=
>>>>>    re-set the memory encryption mask) without changing the UEFI memory
>>>>>    map. (I trust that MemEncryptSevSetPageEncMask() will not split page
>>>>>    tables internally when it *re*sets the encryption mask -- is that
>>>>>    correct?)
>>
>>
>>
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Ard Biesheuvel 7 years, 5 months ago
On 27 July 2017 at 23:10, Brijesh Singh <brijesh.singh@amd.com> wrote:
>
>
> On 07/27/2017 04:31 PM, Ard Biesheuvel wrote:
>>
>>
>>> On 27 Jul 2017, at 21:55, Brijesh Singh <brijesh.singh@amd.com> wrote:
>>>
>>>
>>>
>>> On 07/27/2017 02:00 PM, Brijesh Singh wrote:
>>>
>>>>>> This distribution of operations seems wrong. The key point is that
>>>>>> AllocateBuffer() *need not* result in a buffer that is immediately
>>>>>> usable, and that client code is required to call Map()
>>>>>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>>>>>> operation. Therefore, the right distribution of operations is:
>>>>>>
>>>>>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>>>>>    encryption mask..
>>>>>>
>>>>>> - IoMmuFreeBuffer() deallocates pages and does not touch the
>>>>>> encryption
>>>>>>    mask.
>>>>>>
>>>> Actually one of main reason why we cleared and restored the memory
>>>> encryption mask
>>>> during allocate/free is because we also consume the IOMMU protocol in
>>>> QemuFwCfgLib
>>>> as a method to allocate and free a DMA buffer. I am certainly open to
>>>> suggestions.
>>>> [1]
>>>> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
>>>> [2]
>>>> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197
>>>>>>
>>>>>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>>>>>    requested, and it allocates pages (bounce buffer) otherwise.
>>>>>>
>>>> I am trying to wrap my head around how we can support
>>>> BusMasterCommonBuffer
>>>> when buffer was not allocated by us. Changing the memory encryption mask
>>>> in
>>>> a page table will not update the contents. Also since the memory
>>>> encryption
>>>> mask works on PAGE_SIZE hence changing the encryption mask on not our
>>>> allocated
>>>> buffer could mess things up (e.g if NumberOfBytes is not PAGE_SIZE
>>>> aligned).
>>>
>>>
>>> I may be missing something in my understanding. Here is a flow I have in
>>> my
>>> mind, please correct me.
>>>
>>> OvmfPkg/VirtIoBlk.c:
>>>
>>> VirtioBlkInit()
>>>   ....
>>>   ....
>>>   VirtioRingInit
>>>     Virtio->AllocateSharedPages(RingSize, &Ring->Base)
>>>       PciIo->AllocatePages(RingSize, &RingAddress)
>>>     Virtio->MapSharedPages(...,BusMasterCommonBuffer, Ring->Base,
>>> RingSize, &RingDeviceAddress)
>>>     .....
>>>     .....
>>>
>>> This case is straight forward and we can easily maps. No need for bounce
>>> buffering.
>>>
>>> VirtioBlkReadBlocks(..., BufferSize, Buffer,)
>>>   ......
>>>   ......
>>>   SynchronousRequest(..., BufferSize, Buffer)
>>>     ....
>>>     Virtio->MapSharedPages(..., BusMasterCommonBuffer, Buffer,
>>> BufferSize, &DeviceAddress)
>>>     VirtioAppendDesc(DeviceAddress, BufferSize, ...)
>>>     VirtioFlush (...)
>>>     In the above case, "Buffer" was not allocated by us hence we will not
>>> able to change the
>>> memory encryption attributes. Am I missing something in the flow ?
>>>
>>
>>
>> Common buffer mappings may only be created from buffers that were
>> allocated by AllocateBuffer(). In fact, that is its main purpose
>
>
> Yes, that part is well understood. If the buffer was allocated by us (e.g
> vring, request/status
> structure etc) then those should be mapped as "BusMasterCommonBuffer".
>
> But I am trying to figure out, how to map a data buffers before issuing a
> virtio request. e.g when
> VirtioBlkReadBlocks() is called, "Buffer" pointer is not a DMA address hence
> we need to map it.
> I think it should be mapped using "BusMasterWrite" not
> "BusMasterCommonBuffer" before adding into vring.
>

If the transfer is strictly unidirectional, then that should work. If
the transfer goes both ways, you may need to map/unmap for read and
then map/unmap for write
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Laszlo Ersek 7 years, 5 months ago
On 07/28/17 10:39, Ard Biesheuvel wrote:
> On 27 July 2017 at 23:10, Brijesh Singh <brijesh.singh@amd.com> wrote:
>>
>>
>> On 07/27/2017 04:31 PM, Ard Biesheuvel wrote:
>>>
>>>
>>>> On 27 Jul 2017, at 21:55, Brijesh Singh <brijesh.singh@amd.com> wrote:
>>>>
>>>>
>>>>
>>>> On 07/27/2017 02:00 PM, Brijesh Singh wrote:
>>>>
>>>>>>> This distribution of operations seems wrong. The key point is that
>>>>>>> AllocateBuffer() *need not* result in a buffer that is immediately
>>>>>>> usable, and that client code is required to call Map()
>>>>>>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>>>>>>> operation. Therefore, the right distribution of operations is:
>>>>>>>
>>>>>>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>>>>>>    encryption mask..
>>>>>>>
>>>>>>> - IoMmuFreeBuffer() deallocates pages and does not touch the
>>>>>>> encryption
>>>>>>>    mask.
>>>>>>>
>>>>> Actually one of main reason why we cleared and restored the memory
>>>>> encryption mask
>>>>> during allocate/free is because we also consume the IOMMU protocol in
>>>>> QemuFwCfgLib
>>>>> as a method to allocate and free a DMA buffer. I am certainly open to
>>>>> suggestions.
>>>>> [1]
>>>>> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
>>>>> [2]
>>>>> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197
>>>>>>>
>>>>>>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>>>>>>    requested, and it allocates pages (bounce buffer) otherwise.
>>>>>>>
>>>>> I am trying to wrap my head around how we can support
>>>>> BusMasterCommonBuffer
>>>>> when buffer was not allocated by us. Changing the memory encryption mask
>>>>> in
>>>>> a page table will not update the contents. Also since the memory
>>>>> encryption
>>>>> mask works on PAGE_SIZE hence changing the encryption mask on not our
>>>>> allocated
>>>>> buffer could mess things up (e.g if NumberOfBytes is not PAGE_SIZE
>>>>> aligned).
>>>>
>>>>
>>>> I may be missing something in my understanding. Here is a flow I have in
>>>> my
>>>> mind, please correct me.
>>>>
>>>> OvmfPkg/VirtIoBlk.c:
>>>>
>>>> VirtioBlkInit()
>>>>   ....
>>>>   ....
>>>>   VirtioRingInit
>>>>     Virtio->AllocateSharedPages(RingSize, &Ring->Base)
>>>>       PciIo->AllocatePages(RingSize, &RingAddress)
>>>>     Virtio->MapSharedPages(...,BusMasterCommonBuffer, Ring->Base,
>>>> RingSize, &RingDeviceAddress)
>>>>     .....
>>>>     .....
>>>>
>>>> This case is straight forward and we can easily maps. No need for bounce
>>>> buffering.
>>>>
>>>> VirtioBlkReadBlocks(..., BufferSize, Buffer,)
>>>>   ......
>>>>   ......
>>>>   SynchronousRequest(..., BufferSize, Buffer)
>>>>     ....
>>>>     Virtio->MapSharedPages(..., BusMasterCommonBuffer, Buffer,
>>>> BufferSize, &DeviceAddress)
>>>>     VirtioAppendDesc(DeviceAddress, BufferSize, ...)
>>>>     VirtioFlush (...)
>>>>     In the above case, "Buffer" was not allocated by us hence we will not
>>>> able to change the
>>>> memory encryption attributes. Am I missing something in the flow ?
>>>>
>>>
>>>
>>> Common buffer mappings may only be created from buffers that were
>>> allocated by AllocateBuffer(). In fact, that is its main purpose
>>
>>
>> Yes, that part is well understood. If the buffer was allocated by us (e.g
>> vring, request/status
>> structure etc) then those should be mapped as "BusMasterCommonBuffer".

Brijesh, thanks for the clarification. Previously I replied (at length)
to your paragraph that said "trying to wrap my head around...", and it
wasn't clear what you meant by "allocated by us". In my previous
response, I assumed that you meant a distinction between "allocated in
Map()" vs. "allocated in AllocateBuffer()". I stand by my earlier answer
for that (assumed) distinction, but now I see that you meant something else.

>>
>> But I am trying to figure out, how to map a data buffers before issuing a
>> virtio request. e.g when
>> VirtioBlkReadBlocks() is called, "Buffer" pointer is not a DMA address hence
>> we need to map it.
>> I think it should be mapped using "BusMasterWrite" not
>> "BusMasterCommonBuffer" before adding into vring.
>>
> 
> If the transfer is strictly unidirectional, then that should work. If
> the transfer goes both ways, you may need to map/unmap for read and
> then map/unmap for write
> 

You (Brijesh and Ard) are both right. This question depends on the
outermost interface that the specific virtio driver provides. In this
case, VirtioBlkReadBlocks() implements
EFI_BLOCK_IO_PROTOCOL.ReadBlocks(). The buffer is owned by an
independent agent, and it is guaranteed by the ReadBlocks() interface
that the transfer is unidirectional. So a standalone Map() call with
BusMasterWrite is appropriate, followed by a standalone Unmap() call
*before* VirtioBlkReadBlocks() returns. In this sense, my earlier
request to "*always* use AllocateBuffer + Map" was too strict.

However, in the *general* case, the recommendation remains the same. For
the virtio-net driver for example, the interfaces are not synchronous.
E.g., while EFI_BLOCK_IO_PROTOCOL.WriteBlocks() is synchronous,
EFI_SIMPLE_NETWORK_PROTOCOL.Transmit() is not. So, although in
VirtioNetTransmit() we might be tempted to use BusMasterRead, that's not
right, because ExitBootServices() could be called before the SNP client
collects the buffer with VirtioNetGetStatus().

The ExitBootServices() callback will have to Unmap() the area without
freeing it, and that's only possible if BusMasterCommonBuffer is used in
VirtioNetTransmit() to begin with. This means that we'll have to save
the client's data -- after updating it according to HeaderSize -- with
AllocateBuffer() in VirtioNetTransmit(), Map() *that* as
BusMasterCommonBuffer, and undo both steps in VirtioNetGetStatus(). And,
in the exit-boot-services callback, it has to be Unmap()ped only, but
not freed.

Basically it depends upon whether you can complete the entire operation
synchronously, before the outermost protocol interface returns.

I recommend that we work on the IoMmuDxe and QemuFwCfgLib updates first.
(And, my apologies again for not catching these issues immediately; as I
said, this is my first time doing non-1:1 DMA.)

Thanks,
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Laszlo Ersek 7 years, 5 months ago
On 07/27/17 21:00, Brijesh Singh wrote:
> On 07/27/2017 12:56 PM, Ard Biesheuvel wrote:
>> On 27 July 2017 at 18:16, Laszlo Ersek <lersek@redhat.com> wrote:

>>> Normally, Map allocates the bounce buffer internally, and Unmap
>>> releases the bounce buffer internally (for BusMasterRead and
>>> BusMasterWrite), which is not right here. If you use
>>> AllocateBuffer() separately, then Map() -- with
>>> BusMasterCommonBuffer -- will not allocate internally, and Unmap()
>>> will also not deallocate internally. So, in the ExitBootServices()
>>> callback, you will be able to call Unmap() only, and then *not* call
>>> FreeBuffer() at all.
>>>
>>> This is why I suggest introducing all four functions (Allocate,
>>> Free, Map, Unmap) to the VIRTIO_DEVICE_PROTOCOL, and why all virtio
>>> drivers should use all four functions explicitly, not just Map and
>>> Unmap.
>>>
>>> ... Actually, I think there is a bug in
>>> "OvmfPkg/IoMmuDxe/AmdSevIoMmu.c". You have the following
>>> distribution of operations at the moment:
>>>
>>> - IoMmuAllocateBuffer() allocates pages and clears the memory
>>>   encryption mask
>>>
>>> - IoMmuFreeBuffer() re-sets the memory encryption mask, and
>>>   deallocates pages
>>>
>>> - IoMmuMap() does nothing at all when BusMasterCommonBuffer is
>>>   requested (and IoMmuAllocateBuffer() was called previously).
>>>   Otherwise, IoMmuMap() allocates pages, allocates MAP_INFO, and
>>>   clears the memory encryption mask.
>>>
>>> - IoMmuUnmap() does nothing when cleaning up a BusMasterCommonBuffer
>>>   operation (--> NO_MAPPING). Otherwise, IoMmuUnmap() clears the
>>>   encryption mask, and frees both the pages and MAP_INFO.
>>>
>
> I agree with you, there is a bug in AmdSevIoMmu.c. I will fix it. And
> introduce list to track if the buffer was allocated by us. If buffer
> was allocated by us then we can safely say that it does not require a
> bounce buffer. While implementing the initial code I was not able to
> see BusMasterCommonBuffer usage. But with virtio things are getting a
> bit more clear in my mind.

The purpose of the internal list is a little bit different -- it is a
"free list", not a tracker list.

Under the proposed scheme, a MAP_INFO structure will have to be
allocated for *all* mappings: both for CommonBuffer (where the actual
pages come from the caller, who retrieved them earlier with an
AllocateBuffer() call), and for Read/Write (where the actual pages will
be allocated internally to Map). Allocating the MAP_INFO structure in
Map() is necessary in *both* cases because the Unmap() function can
*only* consult this MAP_INFO structure to learn the address and the size
at which it has to re-set the memory encryption mask. This is because
the Unmap() function gets no other input parameter.

Then, regardless of the original operation (CommonBuffer vs.
Read/Write), the MAP_INFO structure has to be recycled in Unmap(). (For
CommonBuffer, the actual pages are owned by the caller, so we don't free
them here; for Read/Write, the actual pages are owned by Map/Unmap, so
we free them in Unmap() *in addition* to recycling MAP_INFO.) The main
point is that MAP_INFO cannot be released with FreePool() because that
could change the UEFI memmap during ExitBootServices() processing. So
MAP_INFO has to be "released" to an internal *free* list.

And since we have an internal free list, the original MAP_INFO
allocation in Map() should consult the free list first, and only if it
is empty should it fall back to AllocatePool().

Whether the actual pages tracked by MAP_INFO were allocated internally
by Map(), or externally by the caller (via AllocateBuffer()) is an
independent question. (And it can be seen from the MAP_INFO.Operation
field.)

Does this make it clearer?

>
>>> This distribution of operations seems wrong. The key point is that
>>> AllocateBuffer() *need not* result in a buffer that is immediately
>>> usable, and that client code is required to call Map()
>>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>>> operation. Therefore, the right distribution of operations is:
>>>
>>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>>   encryption mask..
>>>
>>> - IoMmuFreeBuffer() deallocates pages and does not touch the
>>>   encryption mask.
>>>
>
> Actually one of main reason why we cleared and restored the memory
> encryption mask during allocate/free is because we also consume the
> IOMMU protocol in QemuFwCfgLib as a method to allocate and free a DMA
> buffer. I am certainly open to suggestions.

Argh. That's again my fault then, I should have noticed it. :( I
apologize, the Map/Unmap/AllocateBuffer/FreeBuffer APIs are a "first"
for me as well.

As discussed earlier (and confirmed by Ard), calling *just*
AllocateBuffer() is never enough, Map()/Unmap() are always necessary.

So here's what I suggest (to keep the changes localized):

- Extend the prototype of InternalQemuFwCfgSevDmaAllocateBuffer()
  function to output a "VOID *Mapping" parameter as well, in addition to
  the address.

- Modify the prototype of the InternalQemuFwCfgSevDmaFreeBuffer()
  function to take a "VOID *Mapping" parameter in addition to the buffer
  address.

- In the DXE implementation of InternalQemuFwCfgSevDmaAllocateBuffer(),
  keep the current IOMMU AllocateBuffer() call, but also call IOMMU
  Map(), with CommonBuffer. Furthermore, propagate the Mapping output of
  Map() outwards. (Note that Map() may have to be called in a loop,
  dependent on "NumberOfBytes".)

- In the DXE implementation of InternalQemuFwCfgSevDmaFreeBuffer(), call
  the IOMMU Unmap() function first (using the new Mapping parameter).

>
> [1]
> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
>
> [2]
> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197
>
>
>>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>>   requested, and it allocates pages (bounce buffer) otherwise.
>>>
>
> I am trying to wrap my head around how we can support
> BusMasterCommonBuffer when buffer was not allocated by us. Changing
> the memory encryption mask in a page table will not update the
> contents.

Thank you for the clarification. I've now tried to review the current
code for a better understanding. Are the below statements correct?

- For the guest, guest memory is always readable transparently.
- For the host, guest memory is always readable *as it was written
  last*.
- If the guest memory was last written with the C bit clear, then the
  host can read it as plaintext (regardless of the current state of the
  C bit).
- If the guest memory was last written with the C bit set, then the host
  can only read garbage (regardless of the current state of the C bit).

*If* this is the case, then:

(a) We already have a sort of guest->host information leak. Namely, in
IoMmuFreeBuffer() [OvmfPkg/IoMmuDxe/AmdSevIoMmu.c], the C bit is
restored, and the pages are released with gBS->FreePages(). However, the
pages being released are not rewritten in place (they are not actually
re-encrypted, only marked for encryption whenever they are next written
to). This means that pages released like this remain readable to the
hypervisor for an unpredictable time.

This is not necessarily a critical problem (after all the contents of
those pages were visible to the hypervisor at some point anyway!); but
in the longer term, such pages could accumulate, and if the hypervisor
is compromised later, it could potentially read an unbounded amount of
"past guest data" as plain-text.

(b) Plus, approaching the question from the Map() direction, we need to
consider two scenarios:

- Client code calls AllocateBuffer(), then Map(), and it writes to the
  buffer only then. This should be safe.
- client code calls AllocateBuffer(), writes to it, and then calls
  Map(). This will result in memory contents that look like garbage to
  the hypervisor. Bad.

I can imagine the following to handle these cases: in the Map() and
Unmap() functions, we have to decrypt and encrypt the memory contents
in-place, after changing the C bit (without allocating additional
memory). Introduce a static UINT8 array with EFI_PAGE_SIZE bytes (this
will always remain in encrypted memory). Update the C bit with a single
function call for the entire range (like now) -- this will not affect
the guest-readability of the pages --, then bounce each page within the
range to the static buffer and back to its original place. In effect
this will in-place encrypt or decrypt the memory, and will be faster
than a byte-wise rewrite.

* BusMasterRead (host will want to read guest memory):
  - Client calls Map() without calling AllocateBuffer() first. Map()
    allocates an internal bounce buffer, clears the C bit, and does the
    copying.
  - Client fires off the PCI transaction.
  - Client calls Unmap(), without calling FreeBuffer() later. Unmap()
    restores the C-bit, *zeros* the pages, and releases the pages.

* BusMasterWrite (host will want to write to guest memory):
  - Client calls Map() without calling AllocateBuffer() first. Map()
    allocates an internal bounce buffer and clears the C bit.
  - Client fires off the PCI transaction.
  - Client calls Unmap(), without calling FreeBuffer() later. Unmap()
    copies the bounce buffer to crpyted (client-owned) memory, restores
    the C-bit, *zeros* the pages, and releases the pages.

* BusMasterCommonBuffer:
  - Client calls AllocateBuffer(), and places some data in the returned
    memory.
  - Client calls Map(). Map() clears the C bit in one fell swoop, and
    then decrypts the buffer in-place (by bouncing it page-wise to the
    static array and back).
  - Client communicates with the device.
  - Client calls Unmap(). Unmap() restores the C bit in one fell swoop,
    and encrypts the buffer in-place (by bouncing it page-wise to the
    static array and back).
  - Client reads some residual data from the buffer.
  - Client calls FreeBuffer(). FreeBuffer() relases the pages.

This is suitable for the discussed ExitBootServices() callback update
too, where the callback will reset the virtio device (like now) and then
call Unmap() for all shared areas, without calling FreeBuffer() on them.
The above logic will re-encrypt the contents without affecting the UEFI
memmap.

> Also since the memory encryption mask works on PAGE_SIZE hence
> changing the encryption mask on not our allocated buffer could mess
> things up (e.g if NumberOfBytes is not PAGE_SIZE aligned).

This is not a problem for BusMasterRead and BusMasterWrite because for
those operations, Map() allocates the internal bounce buffer. Also not a
problem for BusMasterCommonBuffer, because then the client first calls
AllocateBuffer (whole number of pages), and so Map() can round up the
number of bytes and de-crypt full pages in-place (see above).

>
>>>    *Regardless* of BusMaster operation, the following actions are
>>>    carried out unconditionally:
>>>
>>>    . the memory encryption mask is cleared in this function (and in
>>>      this function only),
>>>
>>>    . An attempt is made to grab a MAP_INFO structure from an
>>>      internal free list (to be introduced!). The head of the list is
>>>      a new static variable. If the free list is empty, then a
>>>      MAP_INFO structure is allocated with AllocatePool(). The
>>>      NO_MAPPING macro becomes unused and can be deleted from the
>>>      source code.
>>>
>>> - IoMmuUnmap() clears the encryption mask unconditionally. (For
>>>   this, it has to consult the MAP_INFO structure that is being
>>>   passed in from the caller.) In addition:
>>>
>>>    . If MapInfo->Operation is BusMasterCommonBuffer, then we know
>>>      the allocation was done separately in AllocateBuffer, so we do
>>>      not release the pages. Otherwise, we do release the pages.
>>>
>>>    . MapInfo is linked back on the internal free list (see above).
>>>      It is *never* released with FreePool().
>>>
>>>    This approach guarantees that IoMmuUnmap() can de-program the
>>>    IOMMU (= re-set the memory encryption mask) without changing the
>>>    UEFI memory map. (I trust that MemEncryptSevSetPageEncMask() will
>>>    not split page tables internally when it *re*sets the encryption
>>>    mask -- is that correct?)
>
> Yes, MemEncryptSevSetPageEncMask() will not split pages when it
> restores mask.

Great, thanks.
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Brijesh Singh 7 years, 5 months ago

On 07/28/2017 08:38 AM, Laszlo Ersek wrote:
> On 07/27/17 21:00, Brijesh Singh wrote:
>> On 07/27/2017 12:56 PM, Ard Biesheuvel wrote:
>>> On 27 July 2017 at 18:16, Laszlo Ersek <lersek@redhat.com> wrote:
> 
>>>> Normally, Map allocates the bounce buffer internally, and Unmap
>>>> releases the bounce buffer internally (for BusMasterRead and
>>>> BusMasterWrite), which is not right here. If you use
>>>> AllocateBuffer() separately, then Map() -- with
>>>> BusMasterCommonBuffer -- will not allocate internally, and Unmap()
>>>> will also not deallocate internally. So, in the ExitBootServices()
>>>> callback, you will be able to call Unmap() only, and then *not* call
>>>> FreeBuffer() at all.
>>>>
>>>> This is why I suggest introducing all four functions (Allocate,
>>>> Free, Map, Unmap) to the VIRTIO_DEVICE_PROTOCOL, and why all virtio
>>>> drivers should use all four functions explicitly, not just Map and
>>>> Unmap.
>>>>
>>>> ... Actually, I think there is a bug in
>>>> "OvmfPkg/IoMmuDxe/AmdSevIoMmu.c". You have the following
>>>> distribution of operations at the moment:
>>>>
>>>> - IoMmuAllocateBuffer() allocates pages and clears the memory
>>>>    encryption mask
>>>>
>>>> - IoMmuFreeBuffer() re-sets the memory encryption mask, and
>>>>    deallocates pages
>>>>
>>>> - IoMmuMap() does nothing at all when BusMasterCommonBuffer is
>>>>    requested (and IoMmuAllocateBuffer() was called previously).
>>>>    Otherwise, IoMmuMap() allocates pages, allocates MAP_INFO, and
>>>>    clears the memory encryption mask.
>>>>
>>>> - IoMmuUnmap() does nothing when cleaning up a BusMasterCommonBuffer
>>>>    operation (--> NO_MAPPING). Otherwise, IoMmuUnmap() clears the
>>>>    encryption mask, and frees both the pages and MAP_INFO.
>>>>
>>
>> I agree with you, there is a bug in AmdSevIoMmu.c. I will fix it. And
>> introduce list to track if the buffer was allocated by us. If buffer
>> was allocated by us then we can safely say that it does not require a
>> bounce buffer. While implementing the initial code I was not able to
>> see BusMasterCommonBuffer usage. But with virtio things are getting a
>> bit more clear in my mind.
> 
> The purpose of the internal list is a little bit different -- it is a
> "free list", not a tracker list.
> 
> Under the proposed scheme, a MAP_INFO structure will have to be
> allocated for *all* mappings: both for CommonBuffer (where the actual
> pages come from the caller, who retrieved them earlier with an
> AllocateBuffer() call), and for Read/Write (where the actual pages will
> be allocated internally to Map). Allocating the MAP_INFO structure in
> Map() is necessary in *both* cases because the Unmap() function can
> *only* consult this MAP_INFO structure to learn the address and the size
> at which it has to re-set the memory encryption mask. This is because
> the Unmap() function gets no other input parameter.
> 
> Then, regardless of the original operation (CommonBuffer vs.
> Read/Write), the MAP_INFO structure has to be recycled in Unmap(). (For
> CommonBuffer, the actual pages are owned by the caller, so we don't free
> them here; for Read/Write, the actual pages are owned by Map/Unmap, so
> we free them in Unmap() *in addition* to recycling MAP_INFO.) The main
> point is that MAP_INFO cannot be released with FreePool() because that
> could change the UEFI memmap during ExitBootServices() processing. So
> MAP_INFO has to be "released" to an internal *free* list.
> 
> And since we have an internal free list, the original MAP_INFO
> allocation in Map() should consult the free list first, and only if it
> is empty should it fall back to AllocatePool().
> 
> Whether the actual pages tracked by MAP_INFO were allocated internally
> by Map(), or externally by the caller (via AllocateBuffer()) is an
> independent question. (And it can be seen from the MAP_INFO.Operation
> field.)
> 
> Does this make it clearer?
> 

It makes sense. thanks

One question, do we need to return EFI_NOT_SUPPORTED error when we get
request to map a buffer with CommonBuffer but the buffer was not
allocated using "EFI_PCI_IO_PROTOCOL->AllocateBuffer (...)"?

At least as per spec, it seems if caller wants to map a buffer with
CommonBuffer then it must have called "EFI_PCI_IO_PROTOCOL->AllocateBuffer (...)"


>>
>>>> This distribution of operations seems wrong. The key point is that
>>>> AllocateBuffer() *need not* result in a buffer that is immediately
>>>> usable, and that client code is required to call Map()
>>>> *unconditionally*, even if BusMasterCommonBuffer is the desired
>>>> operation. Therefore, the right distribution of operations is:
>>>>
>>>> - IoMmuAllocateBuffer() allocates pages and does not touch the
>>>>    encryption mask..
>>>>
>>>> - IoMmuFreeBuffer() deallocates pages and does not touch the
>>>>    encryption mask.
>>>>
>>
>> Actually one of main reason why we cleared and restored the memory
>> encryption mask during allocate/free is because we also consume the
>> IOMMU protocol in QemuFwCfgLib as a method to allocate and free a DMA
>> buffer. I am certainly open to suggestions.
> 
> Argh. That's again my fault then, I should have noticed it. :( I
> apologize, the Map/Unmap/AllocateBuffer/FreeBuffer APIs are a "first"
> for me as well.
> 
> As discussed earlier (and confirmed by Ard), calling *just*
> AllocateBuffer() is never enough, Map()/Unmap() are always necessary.
> 
> So here's what I suggest (to keep the changes localized):
> 
> - Extend the prototype of InternalQemuFwCfgSevDmaAllocateBuffer()
>    function to output a "VOID *Mapping" parameter as well, in addition to
>    the address.
> 
> - Modify the prototype of the InternalQemuFwCfgSevDmaFreeBuffer()
>    function to take a "VOID *Mapping" parameter in addition to the buffer
>    address.
> 
> - In the DXE implementation of InternalQemuFwCfgSevDmaAllocateBuffer(),
>    keep the current IOMMU AllocateBuffer() call, but also call IOMMU
>    Map(), with CommonBuffer. Furthermore, propagate the Mapping output of
>    Map() outwards. (Note that Map() may have to be called in a loop,
>    dependent on "NumberOfBytes".)
> 
> - In the DXE implementation of InternalQemuFwCfgSevDmaFreeBuffer(), call
>    the IOMMU Unmap() function first (using the new Mapping parameter).
> 

I will update the code and send patch for review. thanks

>>
>> [1]
>> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L159
>>
>> [2]
>> https://github.com/tianocore/edk2/blob/master/OvmfPkg/Library/QemuFwCfgLib/QemuFwCfgDxe.c#L197
>>
>>
>>>> - IoMmuMap() does not allocate pages when BusMasterCommonBuffer is
>>>>    requested, and it allocates pages (bounce buffer) otherwise.
>>>>
>>
>> I am trying to wrap my head around how we can support
>> BusMasterCommonBuffer when buffer was not allocated by us. Changing
>> the memory encryption mask in a page table will not update the
>> contents.
> 
> Thank you for the clarification. I've now tried to review the current
> code for a better understanding. Are the below statements correct?
> 
> - For the guest, guest memory is always readable transparently.
> - For the host, guest memory is always readable *as it was written
>    last*.
> - If the guest memory was last written with the C bit clear, then the
>    host can read it as plaintext (regardless of the current state of the
>    C bit).
> - If the guest memory was last written with the C bit set, then the host
>    can only read garbage (regardless of the current state of the C bit).
> 

Your understanding is correct.

> *If* this is the case, then:
> 
> (a) We already have a sort of guest->host information leak. Namely, in
> IoMmuFreeBuffer() [OvmfPkg/IoMmuDxe/AmdSevIoMmu.c], the C bit is
> restored, and the pages are released with gBS->FreePages(). However, the
> pages being released are not rewritten in place (they are not actually
> re-encrypted, only marked for encryption whenever they are next written
> to). This means that pages released like this remain readable to the
> hypervisor for an unpredictable time.
> 
> This is not necessarily a critical problem (after all the contents of
> those pages were visible to the hypervisor at some point anyway!); but
> in the longer term, such pages could accumulate, and if the hypervisor
> is compromised later, it could potentially read an unbounded amount of
> "past guest data" as plain-text.
> 

Very good catch, I can *zero* the pages. IIRC, I did not consider doing
so because it may introduce unnecessary perform hit (mainly when dealing
with larger pages). I think when we load kernel or initrd using QemuFwCfg
DMA interface we allocate large buffers.

I will still go ahead and experiment *zeroing* page and measure the performance
impact before we integrate the solution.

  
> (b) Plus, approaching the question from the Map() direction, we need to
> consider two scenarios:
> 
> - Client code calls AllocateBuffer(), then Map(), and it writes to the
>    buffer only then. This should be safe.
> - client code calls AllocateBuffer(), writes to it, and then calls
>    Map(). This will result in memory contents that look like garbage to
>    the hypervisor. Bad.
> 
> I can imagine the following to handle these cases: in the Map() and
> Unmap() functions, we have to decrypt and encrypt the memory contents
> in-place, after changing the C bit (without allocating additional
> memory). Introduce a static UINT8 array with EFI_PAGE_SIZE bytes (this
> will always remain in encrypted memory). Update the C bit with a single
> function call for the entire range (like now) -- this will not affect
> the guest-readability of the pages --, then bounce each page within the
> range to the static buffer and back to its original place. In effect
> this will in-place encrypt or decrypt the memory, and will be faster
> than a byte-wise rewrite.
> 
> * BusMasterRead (host will want to read guest memory):
>    - Client calls Map() without calling AllocateBuffer() first. Map()
>      allocates an internal bounce buffer, clears the C bit, and does the
>      copying.
>    - Client fires off the PCI transaction.
>    - Client calls Unmap(), without calling FreeBuffer() later. Unmap()
>      restores the C-bit, *zeros* the pages, and releases the pages.
> 

Yes this will work just fine (see my previous comments on *zeroing* pages)

> * BusMasterWrite (host will want to write to guest memory):
>    - Client calls Map() without calling AllocateBuffer() first. Map()
>      allocates an internal bounce buffer and clears the C bit.
>    - Client fires off the PCI transaction.
>    - Client calls Unmap(), without calling FreeBuffer() later. Unmap()
>      copies the bounce buffer to crpyted (client-owned) memory, restores
>      the C-bit, *zeros* the pages, and releases the pages.
> 

Yes, this will work just fine (see my previous comments on *zeroing* pages)

> * BusMasterCommonBuffer:
>    - Client calls AllocateBuffer(), and places some data in the returned
>      memory.
>    - Client calls Map(). Map() clears the C bit in one fell swoop, and
>      then decrypts the buffer in-place (by bouncing it page-wise to the
>      static array and back).
>    - Client communicates with the device.
>    - Client calls Unmap(). Unmap() restores the C bit in one fell swoop,
>      and encrypts the buffer in-place (by bouncing it page-wise to the
>      static array and back).
>    - Client reads some residual data from the buffer.
>    - Client calls FreeBuffer(). FreeBuffer() relases the pages.
> 

Yes this works fine as long as the client uses EFI_PCI_IO_PROTOCOL.AllocateBuffer()
to allocate the buffer.


> This is suitable for the discussed ExitBootServices() callback update
> too, where the callback will reset the virtio device (like now) and then
> call Unmap() for all shared areas, without calling FreeBuffer() on them.
> The above logic will re-encrypt the contents without affecting the UEFI
> memmap.
> 
>> Also since the memory encryption mask works on PAGE_SIZE hence
>> changing the encryption mask on not our allocated buffer could mess
>> things up (e.g if NumberOfBytes is not PAGE_SIZE aligned).
> 
> This is not a problem for BusMasterRead and BusMasterWrite because for
> those operations, Map() allocates the internal bounce buffer. Also not a
> problem for BusMasterCommonBuffer, because then the client first calls
> AllocateBuffer (whole number of pages), and so Map() can round up the
> number of bytes and de-crypt full pages in-place (see above).
> 
>>
>>>>     *Regardless* of BusMaster operation, the following actions are
>>>>     carried out unconditionally:
>>>>
>>>>     . the memory encryption mask is cleared in this function (and in
>>>>       this function only),
>>>>
>>>>     . An attempt is made to grab a MAP_INFO structure from an
>>>>       internal free list (to be introduced!). The head of the list is
>>>>       a new static variable. If the free list is empty, then a
>>>>       MAP_INFO structure is allocated with AllocatePool(). The
>>>>       NO_MAPPING macro becomes unused and can be deleted from the
>>>>       source code.
>>>>
>>>> - IoMmuUnmap() clears the encryption mask unconditionally. (For
>>>>    this, it has to consult the MAP_INFO structure that is being
>>>>    passed in from the caller.) In addition:
>>>>
>>>>     . If MapInfo->Operation is BusMasterCommonBuffer, then we know
>>>>       the allocation was done separately in AllocateBuffer, so we do
>>>>       not release the pages. Otherwise, we do release the pages.
>>>>
>>>>     . MapInfo is linked back on the internal free list (see above).
>>>>       It is *never* released with FreePool().
>>>>
>>>>     This approach guarantees that IoMmuUnmap() can de-program the
>>>>     IOMMU (= re-set the memory encryption mask) without changing the
>>>>     UEFI memory map. (I trust that MemEncryptSevSetPageEncMask() will
>>>>     not split page tables internally when it *re*sets the encryption
>>>>     mask -- is that correct?)
>>
>> Yes, MemEncryptSevSetPageEncMask() will not split pages when it
>> restores mask.
> 
> Great, thanks.
> Laszlo
> 
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Laszlo Ersek 7 years, 5 months ago
On 07/28/17 18:00, Brijesh Singh wrote:
> 
> 
> On 07/28/2017 08:38 AM, Laszlo Ersek wrote:
>> On 07/27/17 21:00, Brijesh Singh wrote:

>>> I agree with you, there is a bug in AmdSevIoMmu.c. I will fix it. And
>>> introduce list to track if the buffer was allocated by us. If buffer
>>> was allocated by us then we can safely say that it does not require a
>>> bounce buffer. While implementing the initial code I was not able to
>>> see BusMasterCommonBuffer usage. But with virtio things are getting a
>>> bit more clear in my mind.
>>
>> The purpose of the internal list is a little bit different -- it is a
>> "free list", not a tracker list.
>>
>> Under the proposed scheme, a MAP_INFO structure will have to be
>> allocated for *all* mappings: both for CommonBuffer (where the actual
>> pages come from the caller, who retrieved them earlier with an
>> AllocateBuffer() call), and for Read/Write (where the actual pages will
>> be allocated internally to Map). Allocating the MAP_INFO structure in
>> Map() is necessary in *both* cases because the Unmap() function can
>> *only* consult this MAP_INFO structure to learn the address and the size
>> at which it has to re-set the memory encryption mask. This is because
>> the Unmap() function gets no other input parameter.
>>
>> Then, regardless of the original operation (CommonBuffer vs.
>> Read/Write), the MAP_INFO structure has to be recycled in Unmap(). (For
>> CommonBuffer, the actual pages are owned by the caller, so we don't free
>> them here; for Read/Write, the actual pages are owned by Map/Unmap, so
>> we free them in Unmap() *in addition* to recycling MAP_INFO.) The main
>> point is that MAP_INFO cannot be released with FreePool() because that
>> could change the UEFI memmap during ExitBootServices() processing. So
>> MAP_INFO has to be "released" to an internal *free* list.
>>
>> And since we have an internal free list, the original MAP_INFO
>> allocation in Map() should consult the free list first, and only if it
>> is empty should it fall back to AllocatePool().
>>
>> Whether the actual pages tracked by MAP_INFO were allocated internally
>> by Map(), or externally by the caller (via AllocateBuffer()) is an
>> independent question. (And it can be seen from the MAP_INFO.Operation
>> field.)
>>
>> Does this make it clearer?
>>
> 
> It makes sense. thanks
> 
> One question, do we need to return EFI_NOT_SUPPORTED error when we get
> request to map a buffer with CommonBuffer but the buffer was not
> allocated using "EFI_PCI_IO_PROTOCOL->AllocateBuffer (...)"?
> 
> At least as per spec, it seems if caller wants to map a buffer with
> CommonBuffer then it must have called
> "EFI_PCI_IO_PROTOCOL->AllocateBuffer (...)"

Yes, that is it, exactly: if the caller violates a requirement in the
UEFI spec, covering the use of the APIs, then the firmware *may* make an
attempt to detect that (and to reject it), but the firmware is not
*required* to do so.

A much simpler example: on a double-free programming error, the second
gBS->FreePool() call is not *required* to detect the problem. ("The
Buffer that is freed must have been allocated by AllocatePool().")

So, I don't think we need to implement measures for checking that a
CommonBuffer Map() actually refers to a previously returned, active,
AllocateBuffer() address.

(Snipping the rest, I've read it all, thanks for the answers. Nothing to
add this time :) )

Cheers!
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Laszlo Ersek 7 years, 5 months ago
On 07/28/17 18:00, Brijesh Singh wrote:
>
>
> On 07/28/2017 08:38 AM, Laszlo Ersek wrote:
>> On 07/27/17 21:00, Brijesh Singh wrote:

>>> Actually one of main reason why we cleared and restored the memory
>>> encryption mask during allocate/free is because we also consume the
>>> IOMMU protocol in QemuFwCfgLib as a method to allocate and free a
>>> DMA buffer. I am certainly open to suggestions.
>>
>> Argh. That's again my fault then, I should have noticed it. :( I
>> apologize, the Map/Unmap/AllocateBuffer/FreeBuffer APIs are a "first"
>> for me as well.
>>
>> As discussed earlier (and confirmed by Ard), calling *just*
>> AllocateBuffer() is never enough, Map()/Unmap() are always necessary.
>>
>> So here's what I suggest (to keep the changes localized):
>>
>> - Extend the prototype of InternalQemuFwCfgSevDmaAllocateBuffer()
>>   function to output a "VOID *Mapping" parameter as well, in addition
>>   to the address.
>>
>> - Modify the prototype of the InternalQemuFwCfgSevDmaFreeBuffer()
>>   function to take a "VOID *Mapping" parameter in addition to the
>>   buffer address.
>>
>> - In the DXE implementation of
>>   InternalQemuFwCfgSevDmaAllocateBuffer(), keep the current IOMMU
>>   AllocateBuffer() call, but also call IOMMU Map(), with
>>   CommonBuffer. Furthermore, propagate the Mapping output of Map()
>>   outwards. (Note that Map() may have to be called in a loop,
>>   dependent on "NumberOfBytes".)
>>
>> - In the DXE implementation of InternalQemuFwCfgSevDmaFreeBuffer(),
>>   call the IOMMU Unmap() function first (using the new Mapping
>>   parameter).
>>
>
> I will update the code and send patch for review. thanks

Here's an alternative, given that you mentioned being
performance-conscious. I'm not "suggesting" this as preferred, just
offering it for your consideration. Pick whichever you like more.

In this approach, I'm going to go back on my original request, namely to
keep the InternalQemuFwCfgDmaBytes() implementation centralized, in the
"QemuFwCfgLib.c" file. I now realize that the differences are large
enough that this may not have been a good idea. So here goes:

* Internal APIs ("QemuFwCfgLibInternal.h"):

  - Remove the InternalQemuFwCfgSev*() function prototypes.

  - Introduce the InternalQemuFwCfgDmaBytes() function prototype.

* SEC instance ("QemuFwCfgSec.c"):

  - Remove the InternalQemuFwCfgSev*() function definitions.

  - Add the InternalQemuFwCfgDmaBytes() function definition with
    ASSERT(FALSE) + CpuDeadLoop().

* PEI instance ("QemuFwCfgPei.c):

  - Remove the InternalQemuFwCfgSev*() function definitions.

  - Copy the InternalQemuFwCfgDmaBytes() function definition from the
    currently shared code ("QemuFwCfgLib.c"), and remove all branches
    that are related to the SEV enabled case. IOW, specialize the
    implementation to the plaintext case.

  - In QemuFwCfgInitialize(), replace the
    InternalQemuFwCfgSevIsEnabled() function call with a direct call to
    MemEncryptSevIsEnabled().

* DXE instance ("QemuFwCfgDxe.c"):

  - Remove the InternalQemuFwCfgSev*() function definitions.

  - Copy the InternalQemuFwCfgDmaBytes() function definition from the
    currently shared code ("QemuFwCfgLib.c"), as a starting point.

  - Replace the InternalQemuFwCfgSevIsEnabled() call with a direct call
    to MemEncryptSevIsEnabled().

  - When SEV is enabled, split the control area ("FW_CFG_DMA_ACCESS") to
    a separate buffer. This control area should be allocated with IOMMU
    AllocateBuffer(), and Map()ped separately, as
    BusMasterCommonBuffer64.

  - For the data transfer buffer, use *only* Map() and Unmap(), without
    AllocateBuffer() and FreeBuffer(). Use BusMasterRead64 and
    BusMasterWrite64, dependent on FW_CFG_DMA_CTL_WRITE /
    FW_CFG_DMA_CTL_READ. The point is that the potentially large data
    area will be bounced only once, because Map()/Unmap() will own the
    bounce buffer, and the in-place (de)crypting can be avoided. (The
    fw_cfg DMA transfer is completed in one synchronous operation, as
    witnessed by the library client.) The explicit CopyMem() calls can
    be removed.

  - Remove the InternalQemuFwCfgSevDmaAllocateBuffer() and
    InternalQemuFwCfgSevDmaFreeBuffer() calls.

* shared code ("QemuFwCfgLib.c"):

  - remove the InternalQemuFwCfgDmaBytes() function definition.

Thanks,
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Laszlo Ersek 7 years, 5 months ago
On 07/28/17 18:00, Brijesh Singh wrote:
> On 07/28/2017 08:38 AM, Laszlo Ersek wrote:

snip

>> (b) Plus, approaching the question from the Map() direction, we need
>> to consider two scenarios:
>>
>> - Client code calls AllocateBuffer(), then Map(), and it writes to
>>   the buffer only then. This should be safe.
>> - client code calls AllocateBuffer(), writes to it, and then calls
>>   Map(). This will result in memory contents that look like garbage
>>   to the hypervisor. Bad.
>>
>> I can imagine the following to handle these cases: in the Map() and
>> Unmap() functions, we have to decrypt and encrypt the memory contents
>> in-place, after changing the C bit (without allocating additional
>> memory). Introduce a static UINT8 array with EFI_PAGE_SIZE bytes
>> (this will always remain in encrypted memory). Update the C bit with
>> a single function call for the entire range (like now) -- this will
>> not affect the guest-readability of the pages --, then bounce each
>> page within the range to the static buffer and back to its original
>> place. In effect this will in-place encrypt or decrypt the memory,
>> and will be faster than a byte-wise rewrite.

snip

>> * BusMasterCommonBuffer:
>>    - Client calls AllocateBuffer(), and places some data in the
>>      returned memory.
>>    - Client calls Map(). Map() clears the C bit in one fell swoop,
>>      and then decrypts the buffer in-place (by bouncing it page-wise
>>      to the static array and back).
>>    - Client communicates with the device.
>>    - Client calls Unmap(). Unmap() restores the C bit in one fell
>>      swoop, and encrypts the buffer in-place (by bouncing it
>>      page-wise to the static array and back).
>>    - Client reads some residual data from the buffer.
>>    - Client calls FreeBuffer(). FreeBuffer() relases the pages.
>>
>
> Yes this works fine as long as the client uses
> EFI_PCI_IO_PROTOCOL.AllocateBuffer() to allocate the buffer.

Again, a performance-oriented thought:

Above I suggested using a statically allocated page-sized buffer, for
the in-place encryption/decryption. Ultimately this means *two*
CopyMem()s for the entire buffer (just executed page-wise), in *each* of
Map() and Unmap().

Maybe we can do better: what if you perform the CopyMem() from the
buffer right back to the same buffer? CopyMem() is *required* to work
with overlapping source and target areas (similarly to memmove() in
standard C).

This would result in *one* CopyMem (for in-place de-/encryption) in each
of Map() and Unmap(), and thereby it would have identical performance
impact to the BusMasterRead and BusMasterWrite Map() operations (where
copying / crypting takes place between distinct memory areas).

The OVMF DSC files resolve "BaseMemoryLib" -- which provides CopyMem()
-- to "MdePkg/Library/BaseMemoryLibRepStr/BaseMemoryLibRepStr.inf";
regardless of module type. The actual implementation appears to reside
in "MdePkg/Library/BaseMemoryLibRepStr/X64/CopyMem.nasm":

> global ASM_PFX(InternalMemCopyMem)
> ASM_PFX(InternalMemCopyMem):
>     push    rsi
>     push    rdi
>     mov     rsi, rdx                    ; rsi <- Source
>     mov     rdi, rcx                    ; rdi <- Destination
>     lea     r9, [rsi + r8 - 1]          ; r9 <- End of Source
>     cmp     rsi, rdi
>     mov     rax, rdi                    ; rax <- Destination as return value
>     jae     .0
>     cmp     r9, rdi
>     jae     @CopyBackward               ; Copy backward if overlapped
> .0:
>     mov     rcx, r8
>     and     r8, 7
>     shr     rcx, 3
>     rep     movsq                       ; Copy as many Qwords as possible
>     jmp     @CopyBytes
> @CopyBackward:
>     mov     rsi, r9                     ; rsi <- End of Source
>     lea     rdi, [rdi + r8 - 1]         ; esi <- End of Destination
>     std                                 ; set direction flag
> @CopyBytes:
>     mov     rcx, r8
>     rep     movsb                       ; Copy bytes backward
>     cld
>     pop     rdi
>     pop     rsi
>     ret
>

However, I'm afraid even if this works on SEV (which I certainly
expect!), this code won't be reached, due to the following CopyMem()
wrapper implementation in
"MdePkg/Library/BaseMemoryLibRepStr/CopyMemWrapper.c":

> VOID *
> EFIAPI
> CopyMem (
>   OUT VOID       *DestinationBuffer,
>   IN CONST VOID  *SourceBuffer,
>   IN UINTN       Length
>   )
> {
>   if (Length == 0) {
>     return DestinationBuffer;
>   }
>   ASSERT ((Length - 1) <= (MAX_ADDRESS - (UINTN)DestinationBuffer));
>   ASSERT ((Length - 1) <= (MAX_ADDRESS - (UINTN)SourceBuffer));
>
>   if (DestinationBuffer == SourceBuffer) {
>     return DestinationBuffer;
>   }
>   return InternalMemCopyMem (DestinationBuffer, SourceBuffer, Length);
> }

As you see, (DestinationBuffer == SourceBuffer) is handled as a no-op
(quite justifiedly, except in the case of SEV).

Personally I think it would be OK to copy the wrapper function and the
assembly code to OvmfPkg/IoMmuDxe/X64, under the names SevCopyMem() and
InternalSevCopyMem(), and call SevCopyMem() in the CommonBuffer cases of
Map() and Unmap(), for the in-place flipping.

For the 32-bit case (OvmfPkgIa32.dsc), my understanding is that guests
cannot control the C bit at all (there is no C bit in the PTEs), and
memory is always encrypted. Is that correct? If so, then we only need to
ensure that SevCopyMem() compile, as it will never be called -- in the
entry point function of OvmfPkg/IoMmuDxe, MemEncryptSevIsEnabled() will
return FALSE, and so the IOMMU protocol will not be installed. Therefore
the 32-bit version (under OvmfPkg/IoMmuDxe/Ia32) of SevCopyMem() can be
stubbed out as an ASSERT(FALSE)+CpuDeadLoop().

If you can think of a better location for SevCopyMem(), that's fine as
well. For example, you could add it to
"OvmfPkg/Library/BaseMemEncryptSevLib" as well.

... I don't think this functionality should be added under MdePkg,
because it is *very* special to the IOMMU implementation, and
practically no other module should use a "busy" in-place CopyMem().

Thanks
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Brijesh Singh 7 years, 5 months ago

On 7/28/17 2:59 PM, Laszlo Ersek wrote:
> On 07/28/17 18:00, Brijesh Singh wrote:
>> On 07/28/2017 08:38 AM, Laszlo Ersek wrote:
> snip
>
>>> (b) Plus, approaching the question from the Map() direction, we need
>>> to consider two scenarios:
>>>
>>> - Client code calls AllocateBuffer(), then Map(), and it writes to
>>>   the buffer only then. This should be safe.
>>> - client code calls AllocateBuffer(), writes to it, and then calls
>>>   Map(). This will result in memory contents that look like garbage
>>>   to the hypervisor. Bad.
>>>
>>> I can imagine the following to handle these cases: in the Map() and
>>> Unmap() functions, we have to decrypt and encrypt the memory contents
>>> in-place, after changing the C bit (without allocating additional
>>> memory). Introduce a static UINT8 array with EFI_PAGE_SIZE bytes
>>> (this will always remain in encrypted memory). Update the C bit with
>>> a single function call for the entire range (like now) -- this will
>>> not affect the guest-readability of the pages --, then bounce each
>>> page within the range to the static buffer and back to its original
>>> place. In effect this will in-place encrypt or decrypt the memory,
>>> and will be faster than a byte-wise rewrite.
> snip
>
>>> * BusMasterCommonBuffer:
>>>    - Client calls AllocateBuffer(), and places some data in the
>>>      returned memory.
>>>    - Client calls Map(). Map() clears the C bit in one fell swoop,
>>>      and then decrypts the buffer in-place (by bouncing it page-wise
>>>      to the static array and back).
>>>    - Client communicates with the device.
>>>    - Client calls Unmap(). Unmap() restores the C bit in one fell
>>>      swoop, and encrypts the buffer in-place (by bouncing it
>>>      page-wise to the static array and back).
>>>    - Client reads some residual data from the buffer.
>>>    - Client calls FreeBuffer(). FreeBuffer() relases the pages.
>>>
>> Yes this works fine as long as the client uses
>> EFI_PCI_IO_PROTOCOL.AllocateBuffer() to allocate the buffer.
> Again, a performance-oriented thought:
>
> Above I suggested using a statically allocated page-sized buffer, for
> the in-place encryption/decryption. Ultimately this means *two*
> CopyMem()s for the entire buffer (just executed page-wise), in *each* of
> Map() and Unmap().
>
> Maybe we can do better: what if you perform the CopyMem() from the
> buffer right back to the same buffer? CopyMem() is *required* to work
> with overlapping source and target areas (similarly to memmove() in
> standard C).
>
> This would result in *one* CopyMem (for in-place de-/encryption) in each
> of Map() and Unmap(), and thereby it would have identical performance
> impact to the BusMasterRead and BusMasterWrite Map() operations (where
> copying / crypting takes place between distinct memory areas).
>
> The OVMF DSC files resolve "BaseMemoryLib" -- which provides CopyMem()
> -- to "MdePkg/Library/BaseMemoryLibRepStr/BaseMemoryLibRepStr.inf";
> regardless of module type. The actual implementation appears to reside
> in "MdePkg/Library/BaseMemoryLibRepStr/X64/CopyMem.nasm":
>

AMD APM document a procedure which must be used to perform in-place
encryption/decryption. We must follow those steps to ensure that data is
flush into memory using the correct C-bit. Not doing so  may result in
unpredictable results.

http://support.amd.com/TechDocs/24593.pdf (Section 7.10.8)

>> global ASM_PFX(InternalMemCopyMem)
>> ASM_PFX(InternalMemCopyMem):
>>     push    rsi
>>     push    rdi
>>     mov     rsi, rdx                    ; rsi <- Source
>>     mov     rdi, rcx                    ; rdi <- Destination
>>     lea     r9, [rsi + r8 - 1]          ; r9 <- End of Source
>>     cmp     rsi, rdi
>>     mov     rax, rdi                    ; rax <- Destination as return value
>>     jae     .0
>>     cmp     r9, rdi
>>     jae     @CopyBackward               ; Copy backward if overlapped
>> .0:
>>     mov     rcx, r8
>>     and     r8, 7
>>     shr     rcx, 3
>>     rep     movsq                       ; Copy as many Qwords as possible
>>     jmp     @CopyBytes
>> @CopyBackward:
>>     mov     rsi, r9                     ; rsi <- End of Source
>>     lea     rdi, [rdi + r8 - 1]         ; esi <- End of Destination
>>     std                                 ; set direction flag
>> @CopyBytes:
>>     mov     rcx, r8
>>     rep     movsb                       ; Copy bytes backward
>>     cld
>>     pop     rdi
>>     pop     rsi
>>     ret
>>
> However, I'm afraid even if this works on SEV (which I certainly
> expect!), this code won't be reached, due to the following CopyMem()
> wrapper implementation in
> "MdePkg/Library/BaseMemoryLibRepStr/CopyMemWrapper.c":
>
>> VOID *
>> EFIAPI
>> CopyMem (
>>   OUT VOID       *DestinationBuffer,
>>   IN CONST VOID  *SourceBuffer,
>>   IN UINTN       Length
>>   )
>> {
>>   if (Length == 0) {
>>     return DestinationBuffer;
>>   }
>>   ASSERT ((Length - 1) <= (MAX_ADDRESS - (UINTN)DestinationBuffer));
>>   ASSERT ((Length - 1) <= (MAX_ADDRESS - (UINTN)SourceBuffer));
>>
>>   if (DestinationBuffer == SourceBuffer) {
>>     return DestinationBuffer;
>>   }
>>   return InternalMemCopyMem (DestinationBuffer, SourceBuffer, Length);
>> }
> As you see, (DestinationBuffer == SourceBuffer) is handled as a no-op
> (quite justifiedly, except in the case of SEV).
>
> Personally I think it would be OK to copy the wrapper function and the
> assembly code to OvmfPkg/IoMmuDxe/X64, under the names SevCopyMem() and
> InternalSevCopyMem(), and call SevCopyMem() in the CommonBuffer cases of
> Map() and Unmap(), for the in-place flipping.
>
> For the 32-bit case (OvmfPkgIa32.dsc), my understanding is that guests
> cannot control the C bit at all (there is no C bit in the PTEs), and
> memory is always encrypted. Is that correct? If so, then we only need to
> ensure that SevCopyMem() compile, as it will never be called -- in the
> entry point function of OvmfPkg/IoMmuDxe, MemEncryptSevIsEnabled() will
> return FALSE, and so the IOMMU protocol will not be installed. Therefore
> the 32-bit version (under OvmfPkg/IoMmuDxe/Ia32) of SevCopyMem() can be
> stubbed out as an ASSERT(FALSE)+CpuDeadLoop().
>
> If you can think of a better location for SevCopyMem(), that's fine as
> well. For example, you could add it to
> "OvmfPkg/Library/BaseMemEncryptSevLib" as well.
>
> ... I don't think this functionality should be added under MdePkg,
> because it is *very* special to the IOMMU implementation, and
> practically no other module should use a "busy" in-place CopyMem().
>
> Thanks
> Laszlo

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Brijesh Singh 7 years, 5 months ago
On 7/28/17 7:52 PM, Brijesh Singh wrote:
[snip]
> AMD APM document a procedure which must be used to perform in-place
> encryption/decryption. We must follow those steps to ensure that data is
> flush into memory using the correct C-bit. Not doing so  may result in
> unpredictable results.
>
> http://support.amd.com/TechDocs/24593.pdf (Section 7.10.8)
I am wondering if UEFI provides APIs to  get two linear mapping of the
same physical address. The steps says we create two mapping of same
physical address with different C-bits. I will look into UEFI docs to
see if something like this exist otherwise we have to consider two
memcpy. Since its just for CommandBuffers (which usually are smaller
hence we may be okay from performance point of view. Also its just a
boot time thing, does not get used when guest OS is takes over. I will
investigate and see what works best without adding extra complexity in
the code :)

-Brijesh
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Re: [edk2] [RFC v1 0/3] Add VIRTIO_F_IOMMU_PLATFORM support
Posted by Laszlo Ersek 7 years, 4 months ago
On 07/29/17 03:37, Brijesh Singh wrote:
> On 7/28/17 7:52 PM, Brijesh Singh wrote:
> [snip]
>> AMD APM document a procedure which must be used to perform in-place
>> encryption/decryption. We must follow those steps to ensure that data is
>> flush into memory using the correct C-bit. Not doing so  may result in
>> unpredictable results.
>>
>> http://support.amd.com/TechDocs/24593.pdf (Section 7.10.8)
> I am wondering if UEFI provides APIs to  get two linear mapping of the
> same physical address. The steps says we create two mapping of same
> physical address with different C-bits. I will look into UEFI docs to
> see if something like this exist otherwise we have to consider two
> memcpy. Since its just for CommandBuffers (which usually are smaller
> hence we may be okay from performance point of view.

Right, the separate static bounce buffer (4KB in size) and the two
matching CopyMem()s look easier to understand and safer. Hopefully there
won't be a performance issue in practice.

> Also its just a
> boot time thing, does not get used when guest OS is takes over. I will
> investigate and see what works best without adding extra complexity in
> the code :)

Thanks! (Sorry about the late reply.)
Laszlo
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel