From: Wim ten Have <wim.ten.have@oracle.com>
Add libvirtd NUMA cell domain administration functionality to
describe underlying cell id sibling distances in full fashion
when configuring HVM guests.
[below is an example of a 4 node setup]
<cpu>
<numa>
<cell id='0' cpus='0' memory='2097152' unit='KiB'>
<distances>
<sibling id='0' value='10'/>
<sibling id='1' value='21'/>
<sibling id='2' value='31'/>
<sibling id='3' value='41'/>
</distances>
</cell>
<cell id='1' cpus='1' memory='2097152' unit='KiB'>
<distances>
<sibling id='0' value='21'/>
<sibling id='1' value='10'/>
<sibling id='2' value='31'/>
<sibling id='3' value='41'/>
</distances>
</cell>
<cell id='2' cpus='2' memory='2097152' unit='KiB'>
<distances>
<sibling id='0' value='31'/>
<sibling id='1' value='21'/>
<sibling id='2' value='10'/>
<sibling id='3' value='21'/>
</distances>
<cell id='3' cpus='3' memory='2097152' unit='KiB'>
<distances>
<sibling id='0' value='41'/>
<sibling id='1' value='31'/>
<sibling id='2' value='21'/>
<sibling id='3' value='10'/>
</distances>
</cell>
</numa>
</cpu>
Changes under this commit concern all those that require adding
the valid data structures, virDomainNuma* functional routines and the
XML doc schema enhancements to enforce appropriate administration.
These changes are accompanied with topic related documentation under
docs/formatdomain.html within the "CPU model and topology" extending the
"Guest NUMA topology" paragraph.
For terminology we refer to sockets as "nodes" where access to each
others' distinct resources such as memory make them "siblings" with a
designated "distance" between them. A specific design is described under
the ACPI (Advanced Configuration and Power Interface Specification)
within the chapter explaining the system's SLIT (System Locality Distance
Information Table).
These patches extend core libvirt's XML description of a virtual machine's
hardware to include NUMA distance information for sibling nodes, which
is then passed to Xen guests via libxl. Recently qemu landed support for
constructing the SLIT since commit 0f203430dd ("numa: Allow setting NUMA
distance for different NUMA nodes"), hence these core libvirt extensions
can also help other drivers in supporting this feature.
These changes alter the docs/schemas/cputypes.rng enforcing
domain administration to follow the syntax below per numa cell id.
These changes also alter docs/schemas/basictypes.rng to add
"numaDistanceValue" which is an "unsignedInt" with a minimum value
of 10 as 0-9 are reserved values and can not be used as System
Locality Distance Information Table data.
Signed-off-by: Wim ten Have <wim.ten.have@oracle.com>
---
docs/formatdomain.html.in | 64 ++++++++++-
docs/schemas/basictypes.rng | 8 ++
docs/schemas/cputypes.rng | 18 +++
src/conf/cpu_conf.c | 2 +-
src/conf/numa_conf.c | 260 +++++++++++++++++++++++++++++++++++++++++++-
src/conf/numa_conf.h | 25 ++++-
src/libvirt_private.syms | 6 +
7 files changed, 376 insertions(+), 7 deletions(-)
diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index 36bea67..b9736e3 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -1510,7 +1510,69 @@
</p>
<p>
- This guest NUMA specification is currently available only for QEMU/KVM.
+ This guest NUMA specification is currently available only for
+ QEMU/KVM and Xen. Whereas Xen driver also allows for a distinct
+ description of NUMA arranged <code>sibling</code> <code>cell</code>
+ <code>distances</code> <span class="since">Since 3.6.0</span>.
+ </p>
+
+ <p>
+ Under NUMA h/w architecture, distinct resources such as memory
+ create a designated distance between <code>cell</code> and
+ <code>siblings</code> that now can be described with the help of
+ <code>distances</code>. A detailed describtion can be found within
+ the ACPI (Advanced Configuration and Power Interface Specification)
+ within the chapter explaining the system's SLIT (System Locality
+ Distance Information Table).
+ </p>
+
+<pre>
+...
+<cpu>
+ ...
+ <numa>
+ <cell id='0' cpus='0,4-7' memory='512000' unit='KiB'>
+ <distances>
+ <sibling id='0' value='10'/>
+ <sibling id='1' value='21'/>
+ <sibling id='2' value='31'/>
+ <sibling id='3' value='41'/>
+ </distances>
+ </cell>
+ <cell id='1' cpus='1,8-10,12-15' memory='512000' unit='KiB' memAccess='shared'>
+ <distances>
+ <sibling id='0' value='21'/>
+ <sibling id='1' value='10'/>
+ <sibling id='2' value='21'/>
+ <sibling id='3' value='31'/>
+ </distances>
+ </cell>
+ <cell id='2' cpus='2,11' memory='512000' unit='KiB' memAccess='shared'>
+ <distances>
+ <sibling id='0' value='31'/>
+ <sibling id='1' value='21'/>
+ <sibling id='2' value='10'/>
+ <sibling id='3' value='21'/>
+ </distances>
+ </cell>
+ <cell id='3' cpus='3' memory='512000' unit='KiB'>
+ <distances>
+ <sibling id='0' value='41'/>
+ <sibling id='1' value='31'/>
+ <sibling id='2' value='21'/>
+ <sibling id='3' value='10'/>
+ </distances>
+ </cell>
+ </numa>
+ ...
+</cpu>
+...</pre>
+
+ <p>
+ Under Xen driver, if no <code>distances</code> are given to describe
+ the SLIT data between different cells, it will default to a scheme
+ using 10 for local and 21 for other cells. Which is the assumption
+ of guest OS when no SLIT is specified.
</p>
<h3><a name="elementsEvents">Events configuration</a></h3>
diff --git a/docs/schemas/basictypes.rng b/docs/schemas/basictypes.rng
index 1ea667c..a335b5d 100644
--- a/docs/schemas/basictypes.rng
+++ b/docs/schemas/basictypes.rng
@@ -77,6 +77,14 @@
</choice>
</define>
+ <define name="numaDistanceValue">
+ <choice>
+ <data type="unsignedInt">
+ <param name="minInclusive">10</param>
+ </data>
+ </choice>
+ </define>
+
<define name="pciaddress">
<optional>
<attribute name="domain">
diff --git a/docs/schemas/cputypes.rng b/docs/schemas/cputypes.rng
index 3eef16a..c45b6df 100644
--- a/docs/schemas/cputypes.rng
+++ b/docs/schemas/cputypes.rng
@@ -129,6 +129,24 @@
</choice>
</attribute>
</optional>
+ <optional>
+ <element name="distances">
+ <oneOrMore>
+ <ref name="numaDistance"/>
+ </oneOrMore>
+ </element>
+ </optional>
+ </element>
+ </define>
+
+ <define name="numaDistance">
+ <element name="sibling">
+ <attribute name="id">
+ <ref name="unsignedInt"/>
+ </attribute>
+ <attribute name="value">
+ <ref name="numaDistanceValue"/>
+ </attribute>
</element>
</define>
diff --git a/src/conf/cpu_conf.c b/src/conf/cpu_conf.c
index da40e9b..5d8f7be3 100644
--- a/src/conf/cpu_conf.c
+++ b/src/conf/cpu_conf.c
@@ -643,7 +643,7 @@ virCPUDefFormatBufFull(virBufferPtr buf,
if (virCPUDefFormatBuf(&childrenBuf, def, updateCPU) < 0)
goto cleanup;
- if (virDomainNumaDefCPUFormat(&childrenBuf, numa) < 0)
+ if (virDomainNumaDefCPUFormatXML(&childrenBuf, numa) < 0)
goto cleanup;
/* Put it all together */
diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c
index bfd3703..1914810 100644
--- a/src/conf/numa_conf.c
+++ b/src/conf/numa_conf.c
@@ -48,6 +48,8 @@ VIR_ENUM_IMPL(virDomainMemoryAccess, VIR_DOMAIN_MEMORY_ACCESS_LAST,
"shared",
"private")
+typedef struct _virDomainNumaDistance virDomainNumaDistance;
+typedef virDomainNumaDistance *virDomainNumaDistancePtr;
typedef struct _virDomainNumaNode virDomainNumaNode;
typedef virDomainNumaNode *virDomainNumaNodePtr;
@@ -66,6 +68,12 @@ struct _virDomainNuma {
virBitmapPtr nodeset; /* host memory nodes where this guest node resides */
virDomainNumatuneMemMode mode; /* memory mode selection */
virDomainMemoryAccess memAccess; /* shared memory access configuration */
+
+ struct _virDomainNumaDistance {
+ unsigned int value; /* locality value for node i*j */
+ unsigned int cellid;
+ } *distances; /* remote node distances */
+ size_t ndistances;
} *mem_nodes; /* guest node configuration */
size_t nmem_nodes;
@@ -686,6 +694,95 @@ virDomainNumatuneNodesetIsAvailable(virDomainNumaPtr numatune,
}
+static int
+virDomainNumaDefNodeDistanceParseXML(virDomainNumaPtr def,
+ xmlXPathContextPtr ctxt,
+ unsigned int cur_cell)
+{
+ int ret = -1;
+ char *tmp = NULL;
+ size_t i;
+ xmlNodePtr *nodes = NULL;
+ int ndistances;
+ virDomainNumaDistancePtr distances = NULL;
+
+
+ if (!virXPathNode("./distances[1]/sibling", ctxt))
+ return 0;
+
+ if ((ndistances = virXPathNodeSet("./distances[1]/sibling", ctxt, &nodes)) <= 0) {
+ virReportError(VIR_ERR_XML_ERROR, "%s",
+ _("NUMA distances defined without siblings"));
+ goto cleanup;
+ }
+
+ if (ndistances < def->nmem_nodes) {
+ virReportError(VIR_ERR_XML_ERROR,
+ _("NUMA distances defined with fewer siblings than nodes for cell id: '%d'"),
+ cur_cell);
+ goto cleanup;
+ }
+
+ if (VIR_ALLOC_N(distances, ndistances) < 0)
+ goto cleanup;
+
+ for (i = 0; i < ndistances; i++) {
+ unsigned int sibling_id = i, sibling_value;
+
+ /* siblings are in order of parsing or explicitly numbered */
+ if ((tmp = virXMLPropString(nodes[i], "id"))) {
+ if (virStrToLong_uip(tmp, NULL, 10, &sibling_id) < 0) {
+ virReportError(VIR_ERR_XML_ERROR,
+ _("Invalid 'id' attribute in NUMA distances for sibling: '%s'"),
+ tmp);
+ goto cleanup;
+ }
+
+ if (sibling_id >= ndistances) {
+ virReportError(VIR_ERR_XML_ERROR, "%s",
+ _("Exactly one 'sibling' element per NUMA distance "
+ "is allowed, non-contiguous ranges or ranges not "
+ "starting from 0 are not allowed"));
+ goto cleanup;
+ }
+ }
+ VIR_FREE(tmp);
+
+ /* We need a locality value */
+ if (!(tmp = virXMLPropString(nodes[i], "value"))) {
+ virReportError(VIR_ERR_XML_ERROR,
+ _("Missing 'value' attribute in NUMA distances for sibling id: '%d'"),
+ sibling_id);
+ goto cleanup;
+ }
+
+ /* It needs to be applicable */
+ if (virStrToLong_uip(tmp, NULL, 10, &sibling_value) < 0) {
+ virReportError(VIR_ERR_XML_ERROR,
+ _("Invalid 'value' attribute in NUMA distances for sibling id: '%d'"),
+ sibling_id);
+ goto cleanup;
+ }
+ VIR_FREE(tmp);
+
+ distances[sibling_id].cellid = sibling_id;
+ distances[sibling_id].value = sibling_value;
+ }
+
+ def->mem_nodes[cur_cell].distances = distances;
+ def->mem_nodes[cur_cell].ndistances = ndistances;
+
+ ret = 0;
+
+ cleanup:
+ if (ret)
+ VIR_FREE(distances);
+ VIR_FREE(nodes);
+ VIR_FREE(tmp);
+
+ return ret;
+}
+
int
virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
xmlXPathContextPtr ctxt)
@@ -788,6 +885,14 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
def->mem_nodes[cur_cell].memAccess = rc;
VIR_FREE(tmp);
}
+
+ /* Parse NUMA distances info */
+ if (virDomainNumaDefNodeDistanceParseXML(def, ctxt, cur_cell) < 0) {
+ virReportError(VIR_ERR_CONFIG_UNSUPPORTED,
+ _("NUMA cell %d has incorrect 'distances' configured"),
+ cur_cell);
+ goto cleanup;
+ }
}
ret = 0;
@@ -801,8 +906,8 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
int
-virDomainNumaDefCPUFormat(virBufferPtr buf,
- virDomainNumaPtr def)
+virDomainNumaDefCPUFormatXML(virBufferPtr buf,
+ virDomainNumaPtr def)
{
virDomainMemoryAccess memAccess;
char *cpustr;
@@ -815,6 +920,8 @@ virDomainNumaDefCPUFormat(virBufferPtr buf,
virBufferAddLit(buf, "<numa>\n");
virBufferAdjustIndent(buf, 2);
for (i = 0; i < ncells; i++) {
+ int ndistances;
+
memAccess = virDomainNumaGetNodeMemoryAccessMode(def, i);
if (!(cpustr = virBitmapFormat(virDomainNumaGetNodeCpumask(def, i))))
@@ -829,7 +936,30 @@ virDomainNumaDefCPUFormat(virBufferPtr buf,
if (memAccess)
virBufferAsprintf(buf, " memAccess='%s'",
virDomainMemoryAccessTypeToString(memAccess));
- virBufferAddLit(buf, "/>\n");
+
+ ndistances = def->mem_nodes[i].ndistances;
+ if (!ndistances) {
+ virBufferAddLit(buf, "/>\n");
+ } else {
+ size_t j;
+ virDomainNumaDistancePtr distances = def->mem_nodes[i].distances;
+
+ virBufferAddLit(buf, ">\n");
+ virBufferAdjustIndent(buf, 2);
+ virBufferAddLit(buf, "<distances>\n");
+ virBufferAdjustIndent(buf, 2);
+ for (j = 0; j < ndistances; j++) {
+ virBufferAddLit(buf, "<sibling");
+ virBufferAsprintf(buf, " id='%d'", distances[j].cellid);
+ virBufferAsprintf(buf, " value='%d'", distances[j].value);
+ virBufferAddLit(buf, "/>\n");
+ }
+ virBufferAdjustIndent(buf, -2);
+ virBufferAddLit(buf, "</distances>\n");
+ virBufferAdjustIndent(buf, -2);
+ virBufferAddLit(buf, "</cell>\n");
+ }
+
VIR_FREE(cpustr);
}
virBufferAdjustIndent(buf, -2);
@@ -922,13 +1052,121 @@ virDomainNumaCheckABIStability(virDomainNumaPtr src,
size_t
virDomainNumaGetNodeCount(virDomainNumaPtr numa)
{
- if (!numa)
+ if (!numa || !numa->mem_nodes)
return 0;
return numa->nmem_nodes;
}
+size_t
+virDomainNumaSetNodeCount(virDomainNumaPtr numa, size_t nmem_nodes)
+{
+ if (!numa || !nmem_nodes)
+ return 0;
+
+ if (numa->mem_nodes) {
+ virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+ _("Cannot alter an existing nmem_nodes set"));
+ return 0;
+ }
+
+ if (VIR_ALLOC_N(numa->mem_nodes, nmem_nodes) < 0)
+ return 0;
+
+ numa->nmem_nodes = nmem_nodes;
+
+ return numa->nmem_nodes;
+}
+
+
+size_t
+virDomainNumaGetNodeDistance(virDomainNumaPtr numa,
+ size_t node,
+ size_t cellid)
+{
+ virDomainNumaDistancePtr distances;
+
+ if (!numa)
+ return 0;
+
+ distances = numa->mem_nodes[node].distances;
+ if (!numa->mem_nodes[node].ndistances || !distances)
+ return 0;
+
+ return distances[cellid].value;
+}
+
+
+size_t
+virDomainNumaSetNodeDistance(virDomainNumaPtr numa,
+ size_t node,
+ size_t cellid,
+ unsigned int value)
+{
+ virDomainNumaDistancePtr distances;
+
+ /*
+ * Advanced Configuration and Power Interface
+ * Specification version 6.1. Chapter 5.2.17
+ * System Locality Distance Information Table
+ * ... Distance values of 0-9 are reserved.
+ */
+ if (!numa || value < 10)
+ return 0;
+
+ distances = numa->mem_nodes[node].distances;
+
+ if (numa->mem_nodes[node].ndistances > 0 &&
+ distances[cellid].value)
+ return 0;
+
+ distances[cellid].cellid = cellid;
+ distances[cellid].value = value;
+
+ return distances[cellid].value;
+}
+
+
+size_t
+virDomainNumaGetNodeDistanceCount(virDomainNumaPtr numa,
+ size_t node)
+{
+ if (!numa)
+ return 0;
+
+ return numa->mem_nodes[node].ndistances;
+}
+
+
+size_t
+virDomainNumaSetNodeDistanceCount(virDomainNumaPtr numa,
+ size_t node,
+ size_t ndistances)
+{
+ virDomainNumaDistancePtr distances;
+
+ if (!numa || !ndistances)
+ return 0;
+
+ distances = numa->mem_nodes[node].distances;
+ if (distances) {
+ virReportError(VIR_ERR_INTERNAL_ERROR,
+ _("Cannot alter an existing nmem_nodes distances set for node: %zu"),
+ node);
+ return 0;
+ }
+
+ if (VIR_ALLOC_N(distances, ndistances) < 0)
+ return 0;
+
+ numa->mem_nodes[node].distances = distances;
+ numa->mem_nodes[node].ndistances = ndistances;
+
+ return numa->mem_nodes[node].ndistances;
+}
+
+
virBitmapPtr
virDomainNumaGetNodeCpumask(virDomainNumaPtr numa,
size_t node)
@@ -937,6 +1175,20 @@ virDomainNumaGetNodeCpumask(virDomainNumaPtr numa,
}
+virBitmapPtr
+virDomainNumaSetNodeCpumask(virDomainNumaPtr numa,
+ size_t node,
+ virBitmapPtr cpumask)
+{
+ if (!numa || !cpumask)
+ return NULL;
+
+ numa->mem_nodes[node].cpumask = cpumask;
+
+ return numa->mem_nodes[node].cpumask;
+}
+
+
virDomainMemoryAccess
virDomainNumaGetNodeMemoryAccessMode(virDomainNumaPtr numa,
size_t node)
diff --git a/src/conf/numa_conf.h b/src/conf/numa_conf.h
index b6a5354..96dda90 100644
--- a/src/conf/numa_conf.h
+++ b/src/conf/numa_conf.h
@@ -87,12 +87,35 @@ int virDomainNumatuneMaybeGetNodeset(virDomainNumaPtr numatune,
size_t virDomainNumaGetNodeCount(virDomainNumaPtr numa);
+size_t virDomainNumaSetNodeCount(virDomainNumaPtr numa,
+ size_t nmem_nodes);
+
+size_t virDomainNumaGetNodeDistance(virDomainNumaPtr numa,
+ size_t node,
+ size_t sibling)
+ ATTRIBUTE_NONNULL(1);
+size_t virDomainNumaSetNodeDistance(virDomainNumaPtr numa,
+ size_t node,
+ size_t sibling,
+ unsigned int value)
+ ATTRIBUTE_NONNULL(1);
+size_t virDomainNumaGetNodeDistanceCount(virDomainNumaPtr numa,
+ size_t node)
+ ATTRIBUTE_NONNULL(1);
+size_t virDomainNumaSetNodeDistanceCount(virDomainNumaPtr numa,
+ size_t node,
+ size_t ndistances)
+ ATTRIBUTE_NONNULL(1);
virBitmapPtr virDomainNumaGetNodeCpumask(virDomainNumaPtr numa,
size_t node)
ATTRIBUTE_NONNULL(1);
virDomainMemoryAccess virDomainNumaGetNodeMemoryAccessMode(virDomainNumaPtr numa,
size_t node)
ATTRIBUTE_NONNULL(1);
+virBitmapPtr virDomainNumaSetNodeCpumask(virDomainNumaPtr numa,
+ size_t node,
+ virBitmapPtr cpumask)
+ ATTRIBUTE_NONNULL(1);
unsigned long long virDomainNumaGetNodeMemorySize(virDomainNumaPtr numa,
size_t node)
ATTRIBUTE_NONNULL(1);
@@ -151,7 +174,7 @@ bool virDomainNumatuneNodeSpecified(virDomainNumaPtr numatune,
int cellid);
int virDomainNumaDefCPUParseXML(virDomainNumaPtr def, xmlXPathContextPtr ctxt);
-int virDomainNumaDefCPUFormat(virBufferPtr buf, virDomainNumaPtr def);
+int virDomainNumaDefCPUFormatXML(virBufferPtr buf, virDomainNumaPtr def);
unsigned int virDomainNumaGetCPUCountTotal(virDomainNumaPtr numa);
diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms
index 888412a..b4887c1 100644
--- a/src/libvirt_private.syms
+++ b/src/libvirt_private.syms
@@ -705,9 +705,15 @@ virDomainNumaGetMaxCPUID;
virDomainNumaGetMemorySize;
virDomainNumaGetNodeCount;
virDomainNumaGetNodeCpumask;
+virDomainNumaGetNodeDistance;
+virDomainNumaGetNodeDistanceCount;
virDomainNumaGetNodeMemoryAccessMode;
virDomainNumaGetNodeMemorySize;
virDomainNumaNew;
+virDomainNumaSetNodeCount;
+virDomainNumaSetNodeCpumask;
+virDomainNumaSetNodeDistance;
+virDomainNumaSetNodeDistanceCount;
virDomainNumaSetNodeMemorySize;
virDomainNumatuneFormatNodeset;
virDomainNumatuneFormatXML;
--
2.9.4
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
On Wed, Jun 28, 2017 at 03:56:36PM +0200, Wim Ten Have wrote: > From: Wim ten Have <wim.ten.have@oracle.com> > > Add libvirtd NUMA cell domain administration functionality to > describe underlying cell id sibling distances in full fashion > when configuring HVM guests. > > [below is an example of a 4 node setup] > > <cpu> > <numa> > <cell id='0' cpus='0' memory='2097152' unit='KiB'> > <distances> > <sibling id='0' value='10'/> > <sibling id='1' value='21'/> > <sibling id='2' value='31'/> > <sibling id='3' value='41'/> > </distances> > </cell> > <cell id='1' cpus='1' memory='2097152' unit='KiB'> > <distances> > <sibling id='0' value='21'/> > <sibling id='1' value='10'/> > <sibling id='2' value='31'/> > <sibling id='3' value='41'/> > </distances> > </cell> > <cell id='2' cpus='2' memory='2097152' unit='KiB'> > <distances> > <sibling id='0' value='31'/> > <sibling id='1' value='21'/> > <sibling id='2' value='10'/> > <sibling id='3' value='21'/> > </distances> > <cell id='3' cpus='3' memory='2097152' unit='KiB'> > <distances> > <sibling id='0' value='41'/> > <sibling id='1' value='31'/> > <sibling id='2' value='21'/> > <sibling id='3' value='10'/> > </distances> > </cell> > </numa> > </cpu> > > Changes under this commit concern all those that require adding > the valid data structures, virDomainNuma* functional routines and the > XML doc schema enhancements to enforce appropriate administration. > > These changes are accompanied with topic related documentation under > docs/formatdomain.html within the "CPU model and topology" extending the > "Guest NUMA topology" paragraph. > > For terminology we refer to sockets as "nodes" where access to each > others' distinct resources such as memory make them "siblings" with a > designated "distance" between them. A specific design is described under > the ACPI (Advanced Configuration and Power Interface Specification) > within the chapter explaining the system's SLIT (System Locality Distance > Information Table). > > These patches extend core libvirt's XML description of a virtual machine's > hardware to include NUMA distance information for sibling nodes, which > is then passed to Xen guests via libxl. Recently qemu landed support for > constructing the SLIT since commit 0f203430dd ("numa: Allow setting NUMA > distance for different NUMA nodes"), hence these core libvirt extensions > can also help other drivers in supporting this feature. > > These changes alter the docs/schemas/cputypes.rng enforcing > domain administration to follow the syntax below per numa cell id. > > These changes also alter docs/schemas/basictypes.rng to add > "numaDistanceValue" which is an "unsignedInt" with a minimum value > of 10 as 0-9 are reserved values and can not be used as System > Locality Distance Information Table data. > > Signed-off-by: Wim ten Have <wim.ten.have@oracle.com> > --- > docs/formatdomain.html.in | 64 ++++++++++- > docs/schemas/basictypes.rng | 8 ++ > docs/schemas/cputypes.rng | 18 +++ > src/conf/cpu_conf.c | 2 +- > src/conf/numa_conf.c | 260 +++++++++++++++++++++++++++++++++++++++++++- > src/conf/numa_conf.h | 25 ++++- > src/libvirt_private.syms | 6 + > 7 files changed, 376 insertions(+), 7 deletions(-) > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > index 36bea67..b9736e3 100644 > --- a/docs/formatdomain.html.in > +++ b/docs/formatdomain.html.in > @@ -1510,7 +1510,69 @@ > </p> > > <p> > - This guest NUMA specification is currently available only for QEMU/KVM. > + This guest NUMA specification is currently available only for > + QEMU/KVM and Xen. Whereas Xen driver also allows for a distinct > + description of NUMA arranged <code>sibling</code> <code>cell</code> > + <code>distances</code> <span class="since">Since 3.6.0</span>. > + </p> > + > + <p> > + Under NUMA h/w architecture, distinct resources such as memory > + create a designated distance between <code>cell</code> and > + <code>siblings</code> that now can be described with the help of > + <code>distances</code>. A detailed describtion can be found within > + the ACPI (Advanced Configuration and Power Interface Specification) > + within the chapter explaining the system's SLIT (System Locality > + Distance Information Table). > + </p> > + > +<pre> > +... > +<cpu> > + ... > + <numa> > + <cell id='0' cpus='0,4-7' memory='512000' unit='KiB'> > + <distances> > + <sibling id='0' value='10'/> > + <sibling id='1' value='21'/> > + <sibling id='2' value='31'/> > + <sibling id='3' value='41'/> > + </distances> > + </cell> > + <cell id='1' cpus='1,8-10,12-15' memory='512000' unit='KiB' memAccess='shared'> > + <distances> > + <sibling id='0' value='21'/> > + <sibling id='1' value='10'/> > + <sibling id='2' value='21'/> > + <sibling id='3' value='31'/> > + </distances> > + </cell> > + <cell id='2' cpus='2,11' memory='512000' unit='KiB' memAccess='shared'> > + <distances> > + <sibling id='0' value='31'/> > + <sibling id='1' value='21'/> > + <sibling id='2' value='10'/> > + <sibling id='3' value='21'/> > + </distances> > + </cell> > + <cell id='3' cpus='3' memory='512000' unit='KiB'> > + <distances> > + <sibling id='0' value='41'/> > + <sibling id='1' value='31'/> > + <sibling id='2' value='21'/> > + <sibling id='3' value='10'/> > + </distances> > + </cell> > + </numa> > + ... > +</cpu> > +...</pre> > + > + <p> > + Under Xen driver, if no <code>distances</code> are given to describe > + the SLIT data between different cells, it will default to a scheme > + using 10 for local and 21 for other cells. Which is the assumption > + of guest OS when no SLIT is specified. > </p> We need to specify what happens if partial data is given too. In particular if you give a distance for A -> B, and not B -> A, then we should assume symmetry. Rather than having the defaults be duplicated in drivers, we should have a helper API in domain_conf.c that can be used to query the distance between 2 nodes that can be called by drivers. This can then apply defaults in a consistent manner. > @@ -66,6 +68,12 @@ struct _virDomainNuma { > virBitmapPtr nodeset; /* host memory nodes where this guest node resides */ > virDomainNumatuneMemMode mode; /* memory mode selection */ > virDomainMemoryAccess memAccess; /* shared memory access configuration */ > + > + struct _virDomainNumaDistance { > + unsigned int value; /* locality value for node i*j */ Strictly speaking distances have an orderng, ie this is 'node i -> j' in that specific direction. 'node j -> i' can be different. > + unsigned int cellid; Needs extra indent - 4 spaces per indent level > + } *distances; /* remote node distances */ > + size_t ndistances; > } *mem_nodes; /* guest node configuration */ > size_t nmem_nodes; > > @@ -686,6 +694,95 @@ virDomainNumatuneNodesetIsAvailable(virDomainNumaPtr numatune, > } > + for (i = 0; i < ndistances; i++) { > + unsigned int sibling_id = i, sibling_value; > + > + /* siblings are in order of parsing or explicitly numbered */ > + if ((tmp = virXMLPropString(nodes[i], "id"))) { > + if (virStrToLong_uip(tmp, NULL, 10, &sibling_id) < 0) { > + virReportError(VIR_ERR_XML_ERROR, > + _("Invalid 'id' attribute in NUMA distances for sibling: '%s'"), > + tmp); > + goto cleanup; > + } > + > + if (sibling_id >= ndistances) { > + virReportError(VIR_ERR_XML_ERROR, "%s", > + _("Exactly one 'sibling' element per NUMA distance " > + "is allowed, non-contiguous ranges or ranges not " > + "starting from 0 are not allowed")); We should allow non-contiguous ranges, starting from any node id. ie users should be free to provide sparse information, relying on defaults for the rest. > + goto cleanup; > + } > + } > + VIR_FREE(tmp); > + > + /* We need a locality value */ > + if (!(tmp = virXMLPropString(nodes[i], "value"))) { > + virReportError(VIR_ERR_XML_ERROR, > + _("Missing 'value' attribute in NUMA distances for sibling id: '%d'"), > + sibling_id); > + goto cleanup; > + } > + > + /* It needs to be applicable */ > + if (virStrToLong_uip(tmp, NULL, 10, &sibling_value) < 0) { > + virReportError(VIR_ERR_XML_ERROR, > + _("Invalid 'value' attribute in NUMA distances for sibling id: '%d'"), > + sibling_id); > + goto cleanup; > + } > + VIR_FREE(tmp); > + > + distances[sibling_id].cellid = sibling_id; > + distances[sibling_id].value = sibling_value; > + } > + > + def->mem_nodes[cur_cell].distances = distances; > + def->mem_nodes[cur_cell].ndistances = ndistances; > + > + ret = 0; > + > + cleanup: > + if (ret) > + VIR_FREE(distances); > + VIR_FREE(nodes); > + VIR_FREE(tmp); > + > + return ret; > +} > + > int > virDomainNumaDefCPUParseXML(virDomainNumaPtr def, > xmlXPathContextPtr ctxt) > @@ -788,6 +885,14 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def, > def->mem_nodes[cur_cell].memAccess = rc; > VIR_FREE(tmp); > } > + > + /* Parse NUMA distances info */ > + if (virDomainNumaDefNodeDistanceParseXML(def, ctxt, cur_cell) < 0) { > + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, > + _("NUMA cell %d has incorrect 'distances' configured"), > + cur_cell); > + goto cleanup; > + } This method already reported an detailed error message, which is being overwritten here with a useless generic message. > +size_t > +virDomainNumaSetNodeCount(virDomainNumaPtr numa, size_t nmem_nodes) > +{ > + if (!numa || !nmem_nodes) > + return 0; This error condition doesn't report any error message... > + > + if (numa->mem_nodes) { > + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", > + _("Cannot alter an existing nmem_nodes set")); > + return 0; > + } ...but this does... > + > + if (VIR_ALLOC_N(numa->mem_nodes, nmem_nodes) < 0) > + return 0; ...and so does this. Callers need to have consistent error reporting for all scenarios > + > + numa->nmem_nodes = nmem_nodes; > + > + return numa->nmem_nodes; > +} > + > + > +size_t > +virDomainNumaGetNodeDistance(virDomainNumaPtr numa, > + size_t node, > + size_t cellid) > +{ > + virDomainNumaDistancePtr distances; > + > + if (!numa) > + return 0; Just mark 'numa' parameters as ATTRIBUTE_NONNULL and make sure callers don't run this method if numa is not configured. > + > + distances = numa->mem_nodes[node].distances; You've not bounds-checked 'node' vs mem_nodes size. > + if (!numa->mem_nodes[node].ndistances || !distances) > + return 0; This is where we should plug in defaults for partially specified data. > + > + return distances[cellid].value; > +} What are calling semantics here ? I think we should > + > + > +size_t > +virDomainNumaSetNodeDistance(virDomainNumaPtr numa, > + size_t node, > + size_t cellid, > + unsigned int value) > +{ > + virDomainNumaDistancePtr distances; > + > + /* > + * Advanced Configuration and Power Interface > + * Specification version 6.1. Chapter 5.2.17 > + * System Locality Distance Information Table > + * ... Distance values of 0-9 are reserved. > + */ > + if (!numa || value < 10) > + return 0; ATTRIBUTE_NONNULL instead, and report an error for values < 10 > + > + distances = numa->mem_nodes[node].distances; > + > + if (numa->mem_nodes[node].ndistances > 0 && > + distances[cellid].value) > + return 0; I'm not sure what this check is doing - it seems to be refusing to let you change the value if one is already set. I don't see the point of that, but if it is needed then we should report errors > + > + distances[cellid].cellid = cellid; > + distances[cellid].value = value; > + > + return distances[cellid].value; > +} > + > + > +size_t > +virDomainNumaGetNodeDistanceCount(virDomainNumaPtr numa, > + size_t node) > +{ > + if (!numa) > + return 0; ATTRIBUTE_NONNULL instead > + > + return numa->mem_nodes[node].ndistances; > +} > + > + > +size_t > +virDomainNumaSetNodeDistanceCount(virDomainNumaPtr numa, > + size_t node, > + size_t ndistances) > +{ > + virDomainNumaDistancePtr distances; > + > + if (!numa || !ndistances) > + return 0; ATTRIBUTE_NONNULL for first. For the second we should presumably free any existing distance data > + > + distances = numa->mem_nodes[node].distances; > + if (distances) { > + virReportError(VIR_ERR_INTERNAL_ERROR, > + _("Cannot alter an existing nmem_nodes distances set for node: %zu"), > + node); > + return 0; > + } > + > + if (VIR_ALLOC_N(distances, ndistances) < 0) > + return 0; > + > + numa->mem_nodes[node].distances = distances; > + numa->mem_nodes[node].ndistances = ndistances; > + > + return numa->mem_nodes[node].ndistances; > +} > + > + > virBitmapPtr > virDomainNumaGetNodeCpumask(virDomainNumaPtr numa, > size_t node) > @@ -937,6 +1175,20 @@ virDomainNumaGetNodeCpumask(virDomainNumaPtr numa, > } > > > +virBitmapPtr > +virDomainNumaSetNodeCpumask(virDomainNumaPtr numa, > + size_t node, > + virBitmapPtr cpumask) > +{ > + if (!numa || !cpumask) > + return NULL; > + > + numa->mem_nodes[node].cpumask = cpumask; > + > + return numa->mem_nodes[node].cpumask; > +} > + Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Wed, 28 Jun 2017 15:21:29 +0100 "Daniel P. Berrange" <berrange@redhat.com> wrote: Hi Daniel, Appreciate your input and comments. Let me work on package and return asap. One quick question; For future i am looking into way to somehow auto- tune node (cell/socket/cpu) assignment under <numa> directive. Such in a way brings intelligence into libvirt and therefor not wanted. Perhaps this is up to s/w stack utilizing guests per libvirt. If there are thoughts on such matter then i am very interested to hear about them. > On Wed, Jun 28, 2017 at 03:56:36PM +0200, Wim Ten Have wrote: > > From: Wim ten Have <wim.ten.have@oracle.com> > > > > Add libvirtd NUMA cell domain administration functionality to > > describe underlying cell id sibling distances in full fashion > > when configuring HVM guests. > > > > [below is an example of a 4 node setup] > > > > <cpu> > > <numa> > > <cell id='0' cpus='0' memory='2097152' unit='KiB'> > > <distances> > > <sibling id='0' value='10'/> > > <sibling id='1' value='21'/> > > <sibling id='2' value='31'/> > > <sibling id='3' value='41'/> > > </distances> > > </cell> > > <cell id='1' cpus='1' memory='2097152' unit='KiB'> > > <distances> > > <sibling id='0' value='21'/> > > <sibling id='1' value='10'/> > > <sibling id='2' value='31'/> > > <sibling id='3' value='41'/> > > </distances> > > </cell> > > <cell id='2' cpus='2' memory='2097152' unit='KiB'> > > <distances> > > <sibling id='0' value='31'/> > > <sibling id='1' value='21'/> > > <sibling id='2' value='10'/> > > <sibling id='3' value='21'/> > > </distances> > > <cell id='3' cpus='3' memory='2097152' unit='KiB'> > > <distances> > > <sibling id='0' value='41'/> > > <sibling id='1' value='31'/> > > <sibling id='2' value='21'/> > > <sibling id='3' value='10'/> > > </distances> > > </cell> > > </numa> > > </cpu> > > > > Changes under this commit concern all those that require adding > > the valid data structures, virDomainNuma* functional routines and the > > XML doc schema enhancements to enforce appropriate administration. > > > > These changes are accompanied with topic related documentation under > > docs/formatdomain.html within the "CPU model and topology" extending the > > "Guest NUMA topology" paragraph. > > > > For terminology we refer to sockets as "nodes" where access to each > > others' distinct resources such as memory make them "siblings" with a > > designated "distance" between them. A specific design is described under > > the ACPI (Advanced Configuration and Power Interface Specification) > > within the chapter explaining the system's SLIT (System Locality Distance > > Information Table). > > > > These patches extend core libvirt's XML description of a virtual machine's > > hardware to include NUMA distance information for sibling nodes, which > > is then passed to Xen guests via libxl. Recently qemu landed support for > > constructing the SLIT since commit 0f203430dd ("numa: Allow setting NUMA > > distance for different NUMA nodes"), hence these core libvirt extensions > > can also help other drivers in supporting this feature. > > > > These changes alter the docs/schemas/cputypes.rng enforcing > > domain administration to follow the syntax below per numa cell id. > > > > These changes also alter docs/schemas/basictypes.rng to add > > "numaDistanceValue" which is an "unsignedInt" with a minimum value > > of 10 as 0-9 are reserved values and can not be used as System > > Locality Distance Information Table data. > > > > Signed-off-by: Wim ten Have <wim.ten.have@oracle.com> > > --- > > docs/formatdomain.html.in | 64 ++++++++++- > > docs/schemas/basictypes.rng | 8 ++ > > docs/schemas/cputypes.rng | 18 +++ > > src/conf/cpu_conf.c | 2 +- > > src/conf/numa_conf.c | 260 +++++++++++++++++++++++++++++++++++++++++++- > > src/conf/numa_conf.h | 25 ++++- > > src/libvirt_private.syms | 6 + > > 7 files changed, 376 insertions(+), 7 deletions(-) > > > > diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in > > index 36bea67..b9736e3 100644 > > --- a/docs/formatdomain.html.in > > +++ b/docs/formatdomain.html.in > > @@ -1510,7 +1510,69 @@ > > </p> > > > > <p> > > - This guest NUMA specification is currently available only for QEMU/KVM. > > + This guest NUMA specification is currently available only for > > + QEMU/KVM and Xen. Whereas Xen driver also allows for a distinct > > + description of NUMA arranged <code>sibling</code> <code>cell</code> > > + <code>distances</code> <span class="since">Since 3.6.0</span>. > > + </p> > > + > > + <p> > > + Under NUMA h/w architecture, distinct resources such as memory > > + create a designated distance between <code>cell</code> and > > + <code>siblings</code> that now can be described with the help of > > + <code>distances</code>. A detailed describtion can be found within > > + the ACPI (Advanced Configuration and Power Interface Specification) > > + within the chapter explaining the system's SLIT (System Locality > > + Distance Information Table). > > + </p> > > + > > +<pre> > > +... > > +<cpu> > > + ... > > + <numa> > > + <cell id='0' cpus='0,4-7' memory='512000' unit='KiB'> > > + <distances> > > + <sibling id='0' value='10'/> > > + <sibling id='1' value='21'/> > > + <sibling id='2' value='31'/> > > + <sibling id='3' value='41'/> > > + </distances> > > + </cell> > > + <cell id='1' cpus='1,8-10,12-15' memory='512000' unit='KiB' memAccess='shared'> > > + <distances> > > + <sibling id='0' value='21'/> > > + <sibling id='1' value='10'/> > > + <sibling id='2' value='21'/> > > + <sibling id='3' value='31'/> > > + </distances> > > + </cell> > > + <cell id='2' cpus='2,11' memory='512000' unit='KiB' memAccess='shared'> > > + <distances> > > + <sibling id='0' value='31'/> > > + <sibling id='1' value='21'/> > > + <sibling id='2' value='10'/> > > + <sibling id='3' value='21'/> > > + </distances> > > + </cell> > > + <cell id='3' cpus='3' memory='512000' unit='KiB'> > > + <distances> > > + <sibling id='0' value='41'/> > > + <sibling id='1' value='31'/> > > + <sibling id='2' value='21'/> > > + <sibling id='3' value='10'/> > > + </distances> > > + </cell> > > + </numa> > > + ... > > +</cpu> > > +...</pre> > > + > > + <p> > > + Under Xen driver, if no <code>distances</code> are given to describe > > + the SLIT data between different cells, it will default to a scheme > > + using 10 for local and 21 for other cells. Which is the assumption > > + of guest OS when no SLIT is specified. > > </p> > > We need to specify what happens if partial data is given too. In particular > if you give a distance for A -> B, and not B -> A, then we should assume > symmetry. Indeed. > Rather than having the defaults be duplicated in drivers, we should have > a helper API in domain_conf.c that can be used to query the distance > between 2 nodes that can be called by drivers. This can then apply > defaults in a consistent manner. This comment in a way helps partially towards my questions for auto-tune as a prerequisite. On my question, do you think it is wise to explore or do you think libvirt is not the place to have this done? > > @@ -66,6 +68,12 @@ struct _virDomainNuma { > > virBitmapPtr nodeset; /* host memory nodes where this guest node resides */ > > virDomainNumatuneMemMode mode; /* memory mode selection */ > > virDomainMemoryAccess memAccess; /* shared memory access configuration */ > > + > > + struct _virDomainNumaDistance { > > + unsigned int value; /* locality value for node i*j */ > > Strictly speaking distances have an orderng, ie this is 'node i -> j' > in that specific direction. 'node j -> i' can be different. > > > + unsigned int cellid; > > Needs extra indent - 4 spaces per indent level > > > + } *distances; /* remote node distances */ > > + size_t ndistances; > > } *mem_nodes; /* guest node configuration */ > > size_t nmem_nodes; > > > > @@ -686,6 +694,95 @@ virDomainNumatuneNodesetIsAvailable(virDomainNumaPtr numatune, > > } > > > + for (i = 0; i < ndistances; i++) { > > + unsigned int sibling_id = i, sibling_value; > > + > > + /* siblings are in order of parsing or explicitly numbered */ > > + if ((tmp = virXMLPropString(nodes[i], "id"))) { > > + if (virStrToLong_uip(tmp, NULL, 10, &sibling_id) < 0) { > > + virReportError(VIR_ERR_XML_ERROR, > > + _("Invalid 'id' attribute in NUMA distances for sibling: '%s'"), > > + tmp); > > + goto cleanup; > > + } > > + > > + if (sibling_id >= ndistances) { > > + virReportError(VIR_ERR_XML_ERROR, "%s", > > + _("Exactly one 'sibling' element per NUMA distance " > > + "is allowed, non-contiguous ranges or ranges not " > > + "starting from 0 are not allowed")); > > We should allow non-contiguous ranges, starting from any node id. ie users should > be free to provide sparse information, relying on defaults for the rest. > > > + goto cleanup; > > + } > > + } > > + VIR_FREE(tmp); > > + > > + /* We need a locality value */ > > + if (!(tmp = virXMLPropString(nodes[i], "value"))) { > > + virReportError(VIR_ERR_XML_ERROR, > > + _("Missing 'value' attribute in NUMA distances for sibling id: '%d'"), > > + sibling_id); > > + goto cleanup; > > + } > > + > > + /* It needs to be applicable */ > > + if (virStrToLong_uip(tmp, NULL, 10, &sibling_value) < 0) { > > + virReportError(VIR_ERR_XML_ERROR, > > + _("Invalid 'value' attribute in NUMA distances for sibling id: '%d'"), > > + sibling_id); > > + goto cleanup; > > + } > > + VIR_FREE(tmp); > > + > > + distances[sibling_id].cellid = sibling_id; > > + distances[sibling_id].value = sibling_value; > > + } > > + > > + def->mem_nodes[cur_cell].distances = distances; > > + def->mem_nodes[cur_cell].ndistances = ndistances; > > + > > + ret = 0; > > + > > + cleanup: > > + if (ret) > > + VIR_FREE(distances); > > + VIR_FREE(nodes); > > + VIR_FREE(tmp); > > + > > + return ret; > > +} > > + > > int > > virDomainNumaDefCPUParseXML(virDomainNumaPtr def, > > xmlXPathContextPtr ctxt) > > @@ -788,6 +885,14 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def, > > def->mem_nodes[cur_cell].memAccess = rc; > > VIR_FREE(tmp); > > } > > + > > + /* Parse NUMA distances info */ > > + if (virDomainNumaDefNodeDistanceParseXML(def, ctxt, cur_cell) < 0) { > > + virReportError(VIR_ERR_CONFIG_UNSUPPORTED, > > + _("NUMA cell %d has incorrect 'distances' configured"), > > + cur_cell); > > + goto cleanup; > > + } > > This method already reported an detailed error message, which is being > overwritten here with a useless generic message. > > > > +size_t > > +virDomainNumaSetNodeCount(virDomainNumaPtr numa, size_t nmem_nodes) > > +{ > > + if (!numa || !nmem_nodes) > > + return 0; > > This error condition doesn't report any error message... > > > + > > + if (numa->mem_nodes) { > > + virReportError(VIR_ERR_INTERNAL_ERROR, "%s", > > + _("Cannot alter an existing nmem_nodes set")); > > + return 0; > > + } > > ...but this does... > > > + > > + if (VIR_ALLOC_N(numa->mem_nodes, nmem_nodes) < 0) > > + return 0; > > ...and so does this. I maybe missing your thought on specific one. There's other use on memory allocation under this layer but if there is no memory left how could it possibly report other then have specific function return error? > Callers need to have consistent error reporting for all scenarios Let me carefully go over whole again. Regards, - Wim. > > + > > + numa->nmem_nodes = nmem_nodes; > > + > > + return numa->nmem_nodes; > > +} > > > > > + > > + > > +size_t > > +virDomainNumaGetNodeDistance(virDomainNumaPtr numa, > > + size_t node, > > + size_t cellid) > > +{ > > + virDomainNumaDistancePtr distances; > > + > > + if (!numa) > > + return 0; > > Just mark 'numa' parameters as ATTRIBUTE_NONNULL and make > sure callers don't run this method if numa is not > configured. > > > + > > + distances = numa->mem_nodes[node].distances; > > You've not bounds-checked 'node' vs mem_nodes size. > > > + if (!numa->mem_nodes[node].ndistances || !distances) > > + return 0; > > This is where we should plug in defaults for partially > specified data. > > > + > > + return distances[cellid].value; > > +} > > What are calling semantics here ? I think we should > > > + > > + > > +size_t > > +virDomainNumaSetNodeDistance(virDomainNumaPtr numa, > > + size_t node, > > + size_t cellid, > > + unsigned int value) > > +{ > > + virDomainNumaDistancePtr distances; > > + > > + /* > > + * Advanced Configuration and Power Interface > > + * Specification version 6.1. Chapter 5.2.17 > > + * System Locality Distance Information Table > > + * ... Distance values of 0-9 are reserved. > > + */ > > + if (!numa || value < 10) > > + return 0; > > ATTRIBUTE_NONNULL instead, and report an error for > values < 10 > > > > + > > + distances = numa->mem_nodes[node].distances; > > + > > + if (numa->mem_nodes[node].ndistances > 0 && > > + distances[cellid].value) > > + return 0; > > I'm not sure what this check is doing - it seems to be > refusing to let you change the value if one is already > set. I don't see the point of that, but if it is needed > then we should report errors > > > + > > + distances[cellid].cellid = cellid; > > + distances[cellid].value = value; > > + > > + return distances[cellid].value; > > +} > > + > > + > > +size_t > > +virDomainNumaGetNodeDistanceCount(virDomainNumaPtr numa, > > + size_t node) > > +{ > > + if (!numa) > > + return 0; > > ATTRIBUTE_NONNULL instead > > > + > > + return numa->mem_nodes[node].ndistances; > > +} > > + > > + > > +size_t > > +virDomainNumaSetNodeDistanceCount(virDomainNumaPtr numa, > > + size_t node, > > + size_t ndistances) > > +{ > > + virDomainNumaDistancePtr distances; > > + > > + if (!numa || !ndistances) > > + return 0; > > ATTRIBUTE_NONNULL for first. > > For the second we should presumably free any existing > distance data > > > + > > + distances = numa->mem_nodes[node].distances; > > + if (distances) { > > + virReportError(VIR_ERR_INTERNAL_ERROR, > > + _("Cannot alter an existing nmem_nodes distances set for node: %zu"), > > + node); > > + return 0; > > + } > > + > > + if (VIR_ALLOC_N(distances, ndistances) < 0) > > + return 0; > > + > > + numa->mem_nodes[node].distances = distances; > > + numa->mem_nodes[node].ndistances = ndistances; > > + > > + return numa->mem_nodes[node].ndistances; > > +} > > + > > + > > virBitmapPtr > > virDomainNumaGetNodeCpumask(virDomainNumaPtr numa, > > size_t node) > > @@ -937,6 +1175,20 @@ virDomainNumaGetNodeCpumask(virDomainNumaPtr numa, > > } > > > > > > +virBitmapPtr > > +virDomainNumaSetNodeCpumask(virDomainNumaPtr numa, > > + size_t node, > > + virBitmapPtr cpumask) > > +{ > > + if (!numa || !cpumask) > > + return NULL; > > + > > + numa->mem_nodes[node].cpumask = cpumask; > > + > > + return numa->mem_nodes[node].cpumask; > > +} > > + > Regards, > Daniel -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Mon, Jul 03, 2017 at 11:19:07AM +0200, Wim ten Have wrote: > On Wed, 28 Jun 2017 15:21:29 +0100 > "Daniel P. Berrange" <berrange@redhat.com> wrote: > > Hi Daniel, > > Appreciate your input and comments. Let me work on package and return > asap. > > One quick question; For future i am looking into way to somehow auto- > tune node (cell/socket/cpu) assignment under <numa> directive. Such > in a way brings intelligence into libvirt and therefor not wanted. On the <vcpu> elemnet we have a placement=auto attribute, which tells libvirt to pick host pCPU placement for vCPUs automatically. In the QEMU driver, this asks 'numad' for best placement. This is only done during initial guest startup - there's no active monitoring or adjustment thereafter. Anything more advanced than this, is out of scope for libvirt and best served by an external app above. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Wed, 28 Jun 2017 15:21:29 +0100 "Daniel P. Berrange" <berrange@redhat.com> wrote: > On Wed, Jun 28, 2017 at 03:56:36PM +0200, Wim Ten Have wrote: > > From: Wim ten Have <wim.ten.have@oracle.com> > > > > Add libvirtd NUMA cell domain administration functionality to > > describe underlying cell id sibling distances in full fashion > > when configuring HVM guests. > > > > [below is an example of a 4 node setup] Hi Daniel, I think you're best to consult on below issue too. This given it is related to the changes I am preparing but was obviously not yet discussed. Your other hint towards symmetry on distance configuration and autonomous actions made me more carefully approach whole. Working <numa> <distance> support, specifically for xenlight driver I for certain topic related detail consult QEMU/KVM behavior. Here when describing a guest XML configuration in libvirt there's this nuance when <numa> comes into the game. Imagine below. <vcpu placement='static' current='5'>8</vcpu> ... <cpu mode='host-passthrough'> <numa> <cell id='0' cpus='0,2' memory='262144' unit='KiB'/> <cell id='1' cpus='1,3' memory='262144' unit='KiB'/> </numa> </cpu> Running QEMU/KVM with such onlines 5 CPUs. This is wrong IMHO. <wtenhave@kvm26:35> lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 5 On-line CPU(s) list: 0-4 ... NUMA node(s): 2 ... NUMA node0 CPU(s): 0,2,4 NUMA node1 CPU(s): 1,3 My thinking is that <vcpu ... current='_5_' had to be computed somewhere around under; virDomainDefParseXML() -> virDomainDefPostParse() path and forcibly put to '4'. <vcpu placement='static' current='4'>8</vcpu> This is of course a <numa> related issue. Is it worth fixing this (QEMU/KVM) case? If doing that would that potentially hurt specific use and that way libvirt? - Wim. -- Wim ten Have | Consulting Member of Technical Staff Oracle Linux and VM Development Engineering ORACLE Nederland BV | Hertogswetering 163-167 | 3543 AS Utrecht/NL -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
© 2016 - 2025 Red Hat, Inc.