From nobody Wed May 14 13:22:49 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; envelope-from=libvir-list-bounces@redhat.com; helo=mx1.redhat.com; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass(p=none dis=none) header.from=redhat.com Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by mx.zohomail.com with SMTPS id 1524215398452246.8765987909261; Fri, 20 Apr 2018 02:09:58 -0700 (PDT) Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0381581DEB; Fri, 20 Apr 2018 09:09:57 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id C893560A9D; Fri, 20 Apr 2018 09:09:56 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 7B71365D15; Fri, 20 Apr 2018 09:09:56 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id w3K99hD7006814 for ; Fri, 20 Apr 2018 05:09:44 -0400 Received: by smtp.corp.redhat.com (Postfix) id D5DB5AFD4A; Fri, 20 Apr 2018 09:09:43 +0000 (UTC) Received: from moe.brq.redhat.com (unknown [10.43.2.192]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3D794AB5BC; Fri, 20 Apr 2018 09:09:43 +0000 (UTC) From: Michal Privoznik To: libvir-list@redhat.com Date: Fri, 20 Apr 2018 11:09:31 +0200 Message-Id: In-Reply-To: References: In-Reply-To: References: X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-loop: libvir-list@redhat.com Cc: zack.cornelius@kove.net, ehabkost@redhat.com Subject: [libvirt] [PATCH v4 5/5] qemu: Introduce memoryBacking/discard X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Fri, 20 Apr 2018 09:09:57 +0000 (UTC) X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Type: text/plain; charset="utf-8" https://bugzilla.redhat.com/show_bug.cgi?id=3D1480668 QEMU has this new feature memory-backend-file.discard-data=3Dyes which is a nifty optimization. Basically, when qemu is quitting or on memory hotplug it calls munmap() and close() on the file that is backing the memory. However, this does not mean kernel won't stop touching that part of memory. It still might. With this feature enabled we tell kernel: "we don't need this memory nor data stored in it". This makes kernel drop the memory immediately without trying to sync memory with the mapped file. Unfortunately, this cannot be turned on by default because we can't be sure when users really don't care about what happens to data after qemu dies. So it has to be opt-in. As usual, there are three places where one can configure memory attributes. This patch adds the feature to all of them. Signed-off-by: Michal Privoznik --- docs/formatdomain.html.in | 34 ++++++++++++++++++++++-- docs/schemas/cputypes.rng | 5 ++++ docs/schemas/domaincommon.rng | 10 +++++++ src/conf/domain_conf.c | 39 ++++++++++++++++++++++++= ++-- src/conf/domain_conf.h | 3 +++ src/conf/numa_conf.c | 27 +++++++++++++++++++ src/conf/numa_conf.h | 3 +++ src/libvirt_private.syms | 1 + src/qemu/qemu_command.c | 27 ++++++++++++++++--- tests/qemuxml2argvdata/hugepages-pages7.args | 3 ++- tests/qemuxml2argvdata/hugepages-pages7.xml | 4 +-- tests/qemuxml2argvtest.c | 3 ++- 12 files changed, 148 insertions(+), 11 deletions(-) diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in index ada0df227f..ea9d77bd18 100644 --- a/docs/formatdomain.html.in +++ b/docs/formatdomain.html.in @@ -1016,6 +1016,7 @@ <source type=3D"file|anonymous"/> <access mode=3D"shared|private"/> <allocation mode=3D"immediate|ondemand"/> + <discard/> </memoryBacking> ... </domain> @@ -1070,6 +1071,14 @@ numa node by memAccess
allocation
Specify when allocate the memory
+
discard
+
When set and supported by hypervisor the memory + content is discarded just before guest shuts down (or + when DIMM module is unplugged). Please note that this is + just an optimization and is not guaranteed to work in + all cases (e.g. when hypervisor crashes). + Since 4.3.0 (QEMU/KVM only) +
=20 =20 @@ -1608,7 +1617,7 @@ <cpu> ... <numa> - <cell id=3D'0' cpus=3D'0-3' memory=3D'512000' unit=3D'KiB'/> + <cell id=3D'0' cpus=3D'0-3' memory=3D'512000' unit=3D'KiB' discard= =3D'yes'/> <cell id=3D'1' cpus=3D'4-7' memory=3D'512000' unit=3D'KiB' memAcces= s=3D'shared'/> </numa> ... @@ -1634,6 +1643,13 @@ memAccess can control whether the memory is to be mapped as "shared" or "private". This is valid only for hugepages-backed memory and nvdimm modules. + + Each cell element can have an optional + discard attribute which fine tunes the discard + feature for given numa node as described under + Memory Backing. + Accepted values are yes and no. + Since 4.3.0

=20

@@ -7849,7 +7865,7 @@ qemu-kvm -net nic,model=3D? /dev/null

 ...
 <devices>
-  <memory model=3D'dimm' access=3D'private'>
+  <memory model=3D'dimm' access=3D'private' discard=3D'yes'>
     <target>
       <size unit=3D'KiB'>524287</size>
       <node>0</node>
@@ -7903,6 +7919,20 @@ qemu-kvm -net nic,model=3D? /dev/null
         

=20 +
discard
+
+

+ An optional attribute discard + (since 4.3.0) that provides + capability to fine tune discard of data on per module + basis. Accepted values are yes and + no. The feature is described here: + Memory Backing. + This attribute is allowed only for + model=3D'dimm'. +

+
+
source

diff --git a/docs/schemas/cputypes.rng b/docs/schemas/cputypes.rng index c45b6dfb28..1f1e0e36d5 100644 --- a/docs/schemas/cputypes.rng +++ b/docs/schemas/cputypes.rng @@ -129,6 +129,11 @@ + + + + + diff --git a/docs/schemas/domaincommon.rng b/docs/schemas/domaincommon.rng index 4cab55f05d..9650a779b7 100644 --- a/docs/schemas/domaincommon.rng +++ b/docs/schemas/domaincommon.rng @@ -633,6 +633,11 @@ + + + + + @@ -5138,6 +5143,11 @@ + + + + + diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 35666c1347..9585e38bc1 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -5508,6 +5508,20 @@ virDomainVideoDefValidate(const virDomainVideoDef *v= ideo) } =20 =20 +static int +virDomainMemoryDefValidate(const virDomainMemoryDef *mem) +{ + if (mem->model =3D=3D VIR_DOMAIN_MEMORY_MODEL_NVDIMM && + mem->discard =3D=3D VIR_TRISTATE_BOOL_YES) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("discard is not supported for nvdimms")); + return -1; + } + + return 0; +} + + static int virDomainDeviceDefValidateInternal(const virDomainDeviceDef *dev, const virDomainDef *def) @@ -5540,6 +5554,9 @@ virDomainDeviceDefValidateInternal(const virDomainDev= iceDef *dev, case VIR_DOMAIN_DEVICE_VIDEO: return virDomainVideoDefValidate(dev->data.video); =20 + case VIR_DOMAIN_DEVICE_MEMORY: + return virDomainMemoryDefValidate(dev->data.memory); + case VIR_DOMAIN_DEVICE_LEASE: case VIR_DOMAIN_DEVICE_FS: case VIR_DOMAIN_DEVICE_INPUT: @@ -5552,7 +5569,6 @@ virDomainDeviceDefValidateInternal(const virDomainDev= iceDef *dev, case VIR_DOMAIN_DEVICE_SHMEM: case VIR_DOMAIN_DEVICE_TPM: case VIR_DOMAIN_DEVICE_PANIC: - case VIR_DOMAIN_DEVICE_MEMORY: case VIR_DOMAIN_DEVICE_IOMMU: case VIR_DOMAIN_DEVICE_NONE: case VIR_DOMAIN_DEVICE_LAST: @@ -15613,6 +15629,16 @@ virDomainMemoryDefParseXML(virDomainXMLOptionPtr x= mlopt, } VIR_FREE(tmp); =20 + if ((tmp =3D virXMLPropString(memdevNode, "discard"))) { + if ((val =3D virTristateBoolTypeFromString(tmp)) <=3D 0) { + virReportError(VIR_ERR_XML_ERROR, + _("invalid discard value '%s'"), tmp); + goto error; + } + + def->discard =3D val; + } + /* source */ if ((node =3D virXPathNode("./source", ctxt)) && virDomainMemorySourceDefParseXML(node, ctxt, def) < 0) @@ -18939,6 +18965,9 @@ virDomainDefParseXML(xmlDocPtr xml, if (virXPathBoolean("boolean(./memoryBacking/locked)", ctxt)) def->mem.locked =3D true; =20 + if (virXPathBoolean("boolean(./memoryBacking/discard)", ctxt)) + def->mem.discard =3D VIR_TRISTATE_BOOL_YES; + /* Extract blkio cgroup tunables */ if (virXPathUInt("string(./blkiotune/weight)", ctxt, &def->blkio.weight) < 0) @@ -25196,6 +25225,9 @@ virDomainMemoryDefFormat(virBufferPtr buf, if (def->access) virBufferAsprintf(buf, " access=3D'%s'", virDomainMemoryAccessTypeToString(def->access)); + if (def->discard) + virBufferAsprintf(buf, " discard=3D'%s'", + virTristateBoolTypeToString(def->discard)); virBufferAddLit(buf, ">\n"); virBufferAdjustIndent(buf, 2); =20 @@ -26658,7 +26690,8 @@ virDomainDefFormatInternal(virDomainDefPtr def, } =20 if (def->mem.nhugepages || def->mem.nosharepages || def->mem.locked - || def->mem.source || def->mem.access || def->mem.allocation) + || def->mem.source || def->mem.access || def->mem.allocation + || def->mem.discard) { virBufferAddLit(buf, "\n"); virBufferAdjustIndent(buf, 2); @@ -26677,6 +26710,8 @@ virDomainDefFormatInternal(virDomainDefPtr def, if (def->mem.allocation) virBufferAsprintf(buf, "\n", virDomainMemoryAllocationTypeToString(def->mem.allocation)= ); + if (def->mem.discard) + virBufferAddLit(buf, "\n"); =20 virBufferAdjustIndent(buf, -2); virBufferAddLit(buf, "\n"); diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 3c7eccb8ca..52d29124f1 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2107,6 +2107,7 @@ typedef enum { =20 struct _virDomainMemoryDef { virDomainMemoryAccess access; + int discard; /* enum virTristateBool */ =20 /* source */ virBitmapPtr sourceNodes; @@ -2269,6 +2270,8 @@ struct _virDomainMemtune { int source; /* enum virDomainMemorySource */ int access; /* enum virDomainMemoryAccess */ int allocation; /* enum virDomainMemoryAllocation */ + + int discard; /* enum virTristateBool */ }; =20 typedef struct _virDomainPowerManagement virDomainPowerManagement; diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c index 9307dd93d3..a1bbcfa945 100644 --- a/src/conf/numa_conf.c +++ b/src/conf/numa_conf.c @@ -77,6 +77,7 @@ struct _virDomainNuma { virBitmapPtr nodeset; /* host memory nodes where this guest node= resides */ virDomainNumatuneMemMode mode; /* memory mode selection */ virDomainMemoryAccess memAccess; /* shared memory access configura= tion */ + int discard; /* discard-data for memory-backend-file, virTristateB= ool */ =20 struct _virDomainNumaDistance { unsigned int value; /* locality value for node i->j or j->i */ @@ -947,6 +948,18 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def, VIR_FREE(tmp); } =20 + if ((tmp =3D virXMLPropString(nodes[i], "discard"))) { + if ((rc =3D virTristateBoolTypeFromString(tmp)) <=3D 0) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, + _("Invalid 'discard' attribute value '%s'"), + tmp); + goto cleanup; + } + + def->mem_nodes[cur_cell].discard =3D rc; + VIR_FREE(tmp); + } + /* Parse NUMA distances info */ if (virDomainNumaDefNodeDistanceParseXML(def, ctxt, cur_cell) < 0) goto cleanup; @@ -967,6 +980,7 @@ virDomainNumaDefCPUFormatXML(virBufferPtr buf, virDomainNumaPtr def) { virDomainMemoryAccess memAccess; + int discard; char *cpustr; size_t ncells =3D virDomainNumaGetNodeCount(def); size_t i; @@ -980,6 +994,7 @@ virDomainNumaDefCPUFormatXML(virBufferPtr buf, int ndistances; =20 memAccess =3D virDomainNumaGetNodeMemoryAccessMode(def, i); + discard =3D virDomainNumaGetNodeDiscard(def, i); =20 if (!(cpustr =3D virBitmapFormat(virDomainNumaGetNodeCpumask(def, = i)))) return -1; @@ -994,6 +1009,10 @@ virDomainNumaDefCPUFormatXML(virBufferPtr buf, virBufferAsprintf(buf, " memAccess=3D'%s'", virDomainMemoryAccessTypeToString(memAccess)= ); =20 + if (discard) + virBufferAsprintf(buf, " discard=3D'%s'", + virTristateBoolTypeToString(discard)); + ndistances =3D def->mem_nodes[i].ndistances; if (ndistances =3D=3D 0) { virBufferAddLit(buf, "/>\n"); @@ -1304,6 +1323,14 @@ virDomainNumaGetNodeMemoryAccessMode(virDomainNumaPt= r numa, } =20 =20 +int +virDomainNumaGetNodeDiscard(virDomainNumaPtr numa, + size_t node) +{ + return numa->mem_nodes[node].discard; +} + + unsigned long long virDomainNumaGetNodeMemorySize(virDomainNumaPtr numa, size_t node) diff --git a/src/conf/numa_conf.h b/src/conf/numa_conf.h index 7947fdb219..6d8f484f73 100644 --- a/src/conf/numa_conf.h +++ b/src/conf/numa_conf.h @@ -102,6 +102,9 @@ virBitmapPtr virDomainNumaGetNodeCpumask(virDomainNumaP= tr numa, virDomainMemoryAccess virDomainNumaGetNodeMemoryAccessMode(virDomainNumaPt= r numa, size_t node) ATTRIBUTE_NONNULL(1); +int virDomainNumaGetNodeDiscard(virDomainNumaPtr numa, + size_t node) + ATTRIBUTE_NONNULL(1); unsigned long long virDomainNumaGetNodeMemorySize(virDomainNumaPtr numa, size_t node) ATTRIBUTE_NONNULL(1); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index b31f599bd2..d3d0495e42 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -750,6 +750,7 @@ virDomainNumaGetMaxCPUID; virDomainNumaGetMemorySize; virDomainNumaGetNodeCount; virDomainNumaGetNodeCpumask; +virDomainNumaGetNodeDiscard; virDomainNumaGetNodeDistance; virDomainNumaGetNodeMemoryAccessMode; virDomainNumaGetNodeMemorySize; diff --git a/src/qemu/qemu_command.c b/src/qemu/qemu_command.c index b666f3715f..4964c32aeb 100644 --- a/src/qemu/qemu_command.c +++ b/src/qemu/qemu_command.c @@ -3010,6 +3010,7 @@ qemuBuildMemoryBackendStr(virJSONValuePtr *backendPro= ps, unsigned long long pagesize =3D mem->pagesize; bool needHugepage =3D !!pagesize; bool useHugepage =3D !!pagesize; + int discard =3D mem->discard; =20 /* The difference between @needHugepage and @useHugepage is that the l= atter * is true whenever huge page is defined for the current memory cell. @@ -3020,8 +3021,7 @@ qemuBuildMemoryBackendStr(virJSONValuePtr *backendPro= ps, *backendProps =3D NULL; *backendType =3D NULL; =20 - if (memAccess =3D=3D VIR_DOMAIN_MEMORY_ACCESS_DEFAULT && - mem->targetNode >=3D 0) { + if (mem->targetNode >=3D 0) { /* memory devices could provide a invalid guest node */ if (mem->targetNode >=3D virDomainNumaGetNodeCount(def->numa)) { virReportError(VIR_ERR_CONFIG_UNSUPPORTED, @@ -3031,12 +3031,19 @@ qemuBuildMemoryBackendStr(virJSONValuePtr *backendP= rops, return -1; } =20 - memAccess =3D virDomainNumaGetNodeMemoryAccessMode(def->numa, mem-= >targetNode); + if (memAccess =3D=3D VIR_DOMAIN_MEMORY_ACCESS_DEFAULT) + memAccess =3D virDomainNumaGetNodeMemoryAccessMode(def->numa, = mem->targetNode); + + if (discard =3D=3D VIR_TRISTATE_BOOL_ABSENT) + discard =3D virDomainNumaGetNodeDiscard(def->numa, mem->target= Node); } =20 if (memAccess =3D=3D VIR_DOMAIN_MEMORY_ACCESS_DEFAULT) memAccess =3D def->mem.access; =20 + if (discard =3D=3D VIR_TRISTATE_BOOL_ABSENT) + discard =3D def->mem.discard; + if (virDomainNumatuneGetMode(def->numa, mem->targetNode, &mode) < 0 && virDomainNumatuneGetMode(def->numa, -1, &mode) < 0) mode =3D VIR_DOMAIN_NUMATUNE_MEM_STRICT; @@ -3124,6 +3131,20 @@ qemuBuildMemoryBackendStr(virJSONValuePtr *backendPr= ops, NULL) < 0) goto cleanup; =20 + if (!mem->nvdimmPath && + discard =3D=3D VIR_TRISTATE_BOOL_YES) { + if (!virQEMUCapsGet(qemuCaps, QEMU_CAPS_OBJECT_MEMORY_FILE_DIS= CARD)) { + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s", + _("this QEMU doesn't support memory discard= ")); + goto cleanup; + } + + if (virJSONValueObjectAdd(props, + "B:discard-data", true, + NULL) < 0) + goto cleanup; + } + switch (memAccess) { case VIR_DOMAIN_MEMORY_ACCESS_SHARED: if (virJSONValueObjectAdd(props, "b:share", true, NULL) < 0) diff --git a/tests/qemuxml2argvdata/hugepages-pages7.args b/tests/qemuxml2a= rgvdata/hugepages-pages7.args index 1cb598d692..02a98026eb 100644 --- a/tests/qemuxml2argvdata/hugepages-pages7.args +++ b/tests/qemuxml2argvdata/hugepages-pages7.args @@ -18,7 +18,8 @@ mem-path=3D/dev/hugepages1G/libvirt/qemu/-1-fedora,size= =3D1073741824,\ host-nodes=3D1-3,policy=3Dbind \ -device pc-dimm,node=3D0,memdev=3Dmemdimm0,id=3Ddimm0,slot=3D0 \ -object memory-backend-file,id=3Dmemdimm1,prealloc=3Dyes,\ -mem-path=3D/dev/hugepages2M/libvirt/qemu/-1-fedora,share=3Dno,size=3D53687= 0912 \ +mem-path=3D/dev/hugepages2M/libvirt/qemu/-1-fedora,discard-data=3Dyes,shar= e=3Dno,\ +size=3D536870912 \ -device pc-dimm,node=3D0,memdev=3Dmemdimm1,id=3Ddimm1,slot=3D1 \ -uuid 63840878-0deb-4095-97e6-fc444d9bc9fa \ -display none \ diff --git a/tests/qemuxml2argvdata/hugepages-pages7.xml b/tests/qemuxml2ar= gvdata/hugepages-pages7.xml index d75cf5afa3..28c72f85a7 100644 --- a/tests/qemuxml2argvdata/hugepages-pages7.xml +++ b/tests/qemuxml2argvdata/hugepages-pages7.xml @@ -43,7 +43,7 @@

- + 1-3 1048576 @@ -54,7 +54,7 @@
- + 524287 0 diff --git a/tests/qemuxml2argvtest.c b/tests/qemuxml2argvtest.c index 74d930ebe2..481c1ec8bc 100644 --- a/tests/qemuxml2argvtest.c +++ b/tests/qemuxml2argvtest.c @@ -951,7 +951,8 @@ mymain(void) DO_TEST("hugepages-pages5", NONE); DO_TEST("hugepages-pages6", NONE); DO_TEST("hugepages-pages7", - QEMU_CAPS_DEVICE_PC_DIMM, QEMU_CAPS_OBJECT_MEMORY_FILE); + QEMU_CAPS_DEVICE_PC_DIMM, QEMU_CAPS_OBJECT_MEMORY_FILE, + QEMU_CAPS_OBJECT_MEMORY_FILE_DISCARD); DO_TEST("hugepages-memaccess", QEMU_CAPS_OBJECT_MEMORY_FILE, QEMU_CAPS_OBJECT_MEMORY_RAM, QEMU_CAPS_DEVICE_PC_DIMM, QEMU_CAPS_NUMA); --=20 2.16.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list