[libvirt] [PATCH v4 2/5] numa: describe siblings distances within cells

Wim Ten Have posted 5 patches 7 years, 8 months ago
There is a newer version of this series
[libvirt] [PATCH v4 2/5] numa: describe siblings distances within cells
Posted by Wim Ten Have 7 years, 8 months ago
From: Wim ten Have <wim.ten.have@oracle.com>

Add libvirtd NUMA cell domain administration functionality to
describe underlying cell id sibling distances in full fashion
when configuring HVM guests.

Schema updates are made to docs/schemas/cputypes.rng enforcing domain
administration to follow the syntax below the numa cell id and
docs/schemas/basictypes.rng to add "numaDistanceValue".

A minimum value of 10 representing the LOCAL_DISTANCE as 0-9 are
reserved values and can not be used as System Locality Distance Information.
A value of 20 represents the default setting of REMOTE_DISTANCE
where a maximum value of 255 represents UNREACHABLE.

Effectively any cell sibling can be assigned a distance value where
practically 'LOCAL_DISTANCE <= value <= UNREACHABLE'.

[below is an example of a 4 node setup]

  <cpu>
    <numa>
      <cell id='0' cpus='0' memory='2097152' unit='KiB'>
        <distances>
          <sibling id='0' value='10'/>
          <sibling id='1' value='21'/>
          <sibling id='2' value='31'/>
          <sibling id='3' value='41'/>
        </distances>
      </cell>
      <cell id='1' cpus='1' memory='2097152' unit='KiB'>
        <distances>
          <sibling id='0' value='21'/>
          <sibling id='1' value='10'/>
          <sibling id='2' value='31'/>
          <sibling id='3' value='41'/>
        </distances>
      </cell>
      <cell id='2' cpus='2' memory='2097152' unit='KiB'>
        <distances>
          <sibling id='0' value='31'/>
          <sibling id='1' value='21'/>
          <sibling id='2' value='10'/>
          <sibling id='3' value='21'/>
        </distances>
      <cell id='3' cpus='3' memory='2097152' unit='KiB'>
        <distances>
          <sibling id='0' value='41'/>
          <sibling id='1' value='31'/>
          <sibling id='2' value='21'/>
          <sibling id='3' value='10'/>
        </distances>
      </cell>
    </numa>
  </cpu>

Whenever a sibling id the cell LOCAL_DISTANCE does apply and for any
sibling id not being covered a default of REMOTE_DISTANCE is used
for internal computations.

Signed-off-by: Wim ten Have <wim.ten.have@oracle.com>
---
Changes on v1:
- Add changes to docs/formatdomain.html.in describing schema update.
Changes on v2:
- Automatically apply distance symmetry maintaining cell <-> sibling.
- Check for maximum '255' on numaDistanceValue.
- Automatically complete empty distance ranges.
- Check that sibling_id's are in range with cell identifiers.
- Allow non-contiguous ranges, starting from any node id.
- Respect parameters as ATTRIBUTE_NONNULL fix functions and callers.
- Add and apply topology for LOCAL_DISTANCE=10 and REMOTE_DISTANCE=20.
Changes on v3
- Add UNREACHABLE if one locality is unreachable from another.
- Add code cleanup aligning function naming in a separated patch.
- Add numa related driver code in a separated patch.
- Remove <choice> from numaDistanceValue schema/basictypes.rng
- Correct doc changes.
---
 docs/formatdomain.html.in   |  63 +++++++++++++-
 docs/schemas/basictypes.rng |   7 ++
 docs/schemas/cputypes.rng   |  18 ++++
 src/conf/numa_conf.c        | 200 +++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 284 insertions(+), 4 deletions(-)

diff --git a/docs/formatdomain.html.in b/docs/formatdomain.html.in
index 8ca7637..c453d44 100644
--- a/docs/formatdomain.html.in
+++ b/docs/formatdomain.html.in
@@ -1529,7 +1529,68 @@
     </p>
 
     <p>
-      This guest NUMA specification is currently available only for QEMU/KVM.
+      This guest NUMA specification is currently available only for
+      QEMU/KVM and Xen.  Whereas Xen driver also allows for a distinct
+      description of NUMA arranged <code>sibling</code> <code>cell</code>
+      <code>distances</code> <span class="since">Since 3.6.0</span>.
+    </p>
+
+    <p>
+      Under NUMA h/w architecture, distinct resources such as memory
+      create a designated distance between <code>cell</code> and
+      <code>siblings</code> that now can be described with the help of
+      <code>distances</code>. A detailed description can be found within
+      the ACPI (Advanced Configuration and Power Interface Specification)
+      within the chapter explaining the system's SLIT (System Locality
+      Distance Information Table).
+    </p>
+
+<pre>
+...
+&lt;cpu&gt;
+  ...
+  &lt;numa&gt;
+    &lt;cell id='0' cpus='0,4-7' memory='512000' unit='KiB'&gt;
+      &lt;distances&gt;
+        &lt;sibling id='0' value='10'/&gt;
+        &lt;sibling id='1' value='21'/&gt;
+        &lt;sibling id='2' value='31'/&gt;
+        &lt;sibling id='3' value='41'/&gt;
+      &lt;/distances&gt;
+    &lt;/cell&gt;
+    &lt;cell id='1' cpus='1,8-10,12-15' memory='512000' unit='KiB' memAccess='shared'&gt;
+      &lt;distances&gt;
+        &lt;sibling id='0' value='21'/&gt;
+        &lt;sibling id='1' value='10'/&gt;
+        &lt;sibling id='2' value='21'/&gt;
+        &lt;sibling id='3' value='31'/&gt;
+      &lt;/distances&gt;
+    &lt;/cell&gt;
+    &lt;cell id='2' cpus='2,11' memory='512000' unit='KiB' memAccess='shared'&gt;
+      &lt;distances&gt;
+        &lt;sibling id='0' value='31'/&gt;
+        &lt;sibling id='1' value='21'/&gt;
+        &lt;sibling id='2' value='10'/&gt;
+        &lt;sibling id='3' value='21'/&gt;
+      &lt;/distances&gt;
+    &lt;/cell&gt;
+    &lt;cell id='3' cpus='3' memory='512000' unit='KiB'&gt;
+      &lt;distances&gt;
+        &lt;sibling id='0' value='41'/&gt;
+        &lt;sibling id='1' value='31'/&gt;
+        &lt;sibling id='2' value='21'/&gt;
+        &lt;sibling id='3' value='10'/&gt;
+      &lt;/distances&gt;
+    &lt;/cell&gt;
+  &lt;/numa&gt;
+  ...
+&lt;/cpu&gt;
+...</pre>
+
+    <p>
+      Under Xen driver, if no <code>distances</code> are given to describe
+      the SLIT data between different cells, it will default to a scheme
+      using 10 for local and 20 for remote distances.
     </p>
 
     <h3><a id="elementsEvents">Events configuration</a></h3>
diff --git a/docs/schemas/basictypes.rng b/docs/schemas/basictypes.rng
index 1ea667c..1a18cd3 100644
--- a/docs/schemas/basictypes.rng
+++ b/docs/schemas/basictypes.rng
@@ -77,6 +77,13 @@
     </choice>
   </define>
 
+  <define name="numaDistanceValue">
+    <data type="unsignedInt">
+      <param name="minInclusive">10</param>
+      <param name="maxInclusive">255</param>
+    </data>
+  </define>
+
   <define name="pciaddress">
     <optional>
       <attribute name="domain">
diff --git a/docs/schemas/cputypes.rng b/docs/schemas/cputypes.rng
index 3eef16a..c45b6df 100644
--- a/docs/schemas/cputypes.rng
+++ b/docs/schemas/cputypes.rng
@@ -129,6 +129,24 @@
           </choice>
         </attribute>
       </optional>
+      <optional>
+        <element name="distances">
+          <oneOrMore>
+            <ref name="numaDistance"/>
+          </oneOrMore>
+        </element>
+      </optional>
+    </element>
+  </define>
+
+  <define name="numaDistance">
+    <element name="sibling">
+      <attribute name="id">
+        <ref name="unsignedInt"/>
+      </attribute>
+      <attribute name="value">
+        <ref name="numaDistanceValue"/>
+      </attribute>
     </element>
   </define>
 
diff --git a/src/conf/numa_conf.c b/src/conf/numa_conf.c
index b71dc01..5db4311 100644
--- a/src/conf/numa_conf.c
+++ b/src/conf/numa_conf.c
@@ -29,6 +29,15 @@
 #include "virnuma.h"
 #include "virstring.h"
 
+/*
+ * Distance definitions defined Conform ACPI 2.0 SLIT.
+ * See include/linux/topology.h
+ */
+#define LOCAL_DISTANCE          10
+#define REMOTE_DISTANCE         20
+/* SLIT entry value is a one-byte unsigned integer. */
+#define UNREACHABLE            255
+
 #define VIR_FROM_THIS VIR_FROM_DOMAIN
 
 VIR_ENUM_IMPL(virDomainNumatuneMemMode,
@@ -48,6 +57,8 @@ VIR_ENUM_IMPL(virDomainMemoryAccess, VIR_DOMAIN_MEMORY_ACCESS_LAST,
               "shared",
               "private")
 
+typedef struct _virDomainNumaDistance virDomainNumaDistance;
+typedef virDomainNumaDistance *virDomainNumaDistancePtr;
 
 typedef struct _virDomainNumaNode virDomainNumaNode;
 typedef virDomainNumaNode *virDomainNumaNodePtr;
@@ -66,6 +77,12 @@ struct _virDomainNuma {
         virBitmapPtr nodeset;   /* host memory nodes where this guest node resides */
         virDomainNumatuneMemMode mode;  /* memory mode selection */
         virDomainMemoryAccess memAccess; /* shared memory access configuration */
+
+        struct _virDomainNumaDistance {
+            unsigned int value; /* locality value for node i->j or j->i */
+            unsigned int cellid;
+        } *distances;           /* remote node distances */
+        size_t ndistances;
     } *mem_nodes;           /* guest node configuration */
     size_t nmem_nodes;
 
@@ -686,6 +703,153 @@ virDomainNumatuneNodesetIsAvailable(virDomainNumaPtr numatune,
 }
 
 
+static int
+virDomainNumaDefNodeDistanceParseXML(virDomainNumaPtr def,
+                                     xmlXPathContextPtr ctxt,
+                                     unsigned int cur_cell)
+{
+    int ret = -1;
+    int sibling;
+    char *tmp = NULL;
+    xmlNodePtr *nodes = NULL;
+    size_t i, ndistances = def->nmem_nodes;
+
+    if (!ndistances)
+        return 0;
+
+    /* check if NUMA distances definition is present */
+    if (!virXPathNode("./distances[1]", ctxt))
+        return 0;
+
+    if ((sibling = virXPathNodeSet("./distances[1]/sibling", ctxt, &nodes)) <= 0) {
+        virReportError(VIR_ERR_XML_ERROR, "%s",
+                       _("NUMA distances defined without siblings"));
+        goto cleanup;
+    }
+
+    for (i = 0; i < sibling; i++) {
+        virDomainNumaDistancePtr ldist, rdist;
+        unsigned int sibling_id, sibling_value;
+
+        /* siblings are in order of parsing or explicitly numbered */
+        if (!(tmp = virXMLPropString(nodes[i], "id"))) {
+            virReportError(VIR_ERR_XML_ERROR,
+                           _("Missing 'id' attribute in NUMA "
+                             "distances under 'cell id %d'"),
+                           cur_cell);
+            goto cleanup;
+        }
+
+        /* The "id" needs to be applicable */
+        if (virStrToLong_uip(tmp, NULL, 10, &sibling_id) < 0) {
+            virReportError(VIR_ERR_XML_ERROR,
+                           _("Invalid 'id' attribute in NUMA "
+                             "distances for sibling: '%s'"),
+                           tmp);
+            goto cleanup;
+        }
+        VIR_FREE(tmp);
+
+        /* The "id" needs to be within numa/cell range */
+        if (sibling_id >= ndistances) {
+            virReportError(VIR_ERR_XML_ERROR,
+                           _("There is no cell administrated matching "
+                             "'sibling_id %d' under NUMA 'cell id %d' "),
+                           sibling_id, cur_cell);
+            goto cleanup;
+        }
+
+        /* We need a locality value. Check and correct
+         * distance to local and distance to remote node.
+         */
+        if (!(tmp = virXMLPropString(nodes[i], "value"))) {
+            virReportError(VIR_ERR_XML_ERROR,
+                           _("Missing 'value' attribute in NUMA distances "
+                             "under 'cell id %d' for 'sibling id %d'"),
+                           cur_cell, sibling_id);
+            goto cleanup;
+        }
+
+        /* The "value" needs to be applicable */
+        if (virStrToLong_uip(tmp, NULL, 10, &sibling_value) < 0) {
+            virReportError(VIR_ERR_XML_ERROR,
+                           _("Invalid 'value' attribute in NUMA "
+                             "distances for value: '%s'"),
+                           tmp);
+            goto cleanup;
+        }
+        VIR_FREE(tmp);
+
+        /* LOCAL_DISTANCE <= "value" <= UNREACHABLE */
+        if (sibling_value < LOCAL_DISTANCE ||
+            sibling_value > UNREACHABLE) {
+            virReportError(VIR_ERR_XML_ERROR,
+                           _("Out of range value '%d' set for "
+                             "'sibling id %d' under NUMA 'cell id %d' "),
+                           sibling_value, sibling_id, cur_cell);
+            goto cleanup;
+        }
+
+        ldist = def->mem_nodes[cur_cell].distances;
+        if (!ldist) {
+            if (def->mem_nodes[cur_cell].ndistances) {
+                virReportError(VIR_ERR_XML_ERROR,
+                               _("Invalid 'ndistances' set in NUMA "
+                                 "distances for sibling id: '%d'"),
+                               cur_cell);
+                goto cleanup;
+            }
+
+            if (VIR_ALLOC_N(ldist, ndistances) < 0)
+                goto cleanup;
+
+            if (!ldist[cur_cell].value)
+                ldist[cur_cell].value = LOCAL_DISTANCE;
+            ldist[cur_cell].cellid = cur_cell;
+            def->mem_nodes[cur_cell].ndistances = ndistances;
+        }
+
+        ldist[sibling_id].cellid = sibling_id;
+        ldist[sibling_id].value = sibling_value;
+        def->mem_nodes[cur_cell].distances = ldist;
+
+        rdist = def->mem_nodes[sibling_id].distances;
+        if (!rdist) {
+            if (def->mem_nodes[sibling_id].ndistances) {
+                virReportError(VIR_ERR_XML_ERROR,
+                               _("Invalid 'ndistances' set in NUMA "
+                                 "distances for sibling id: '%d'"),
+                               sibling_id);
+                goto cleanup;
+            }
+
+            if (VIR_ALLOC_N(rdist, ndistances) < 0)
+                goto cleanup;
+
+            if (!rdist[sibling_id].value)
+                rdist[sibling_id].value = LOCAL_DISTANCE;
+            rdist[sibling_id].cellid = sibling_id;
+            def->mem_nodes[sibling_id].ndistances = ndistances;
+        }
+
+        rdist[cur_cell].cellid = cur_cell;
+        rdist[cur_cell].value = sibling_value;
+        def->mem_nodes[sibling_id].distances = rdist;
+    }
+
+    ret = 0;
+
+ cleanup:
+    if (ret) {
+        for (i = 0; i < ndistances; i++)
+            VIR_FREE(def->mem_nodes[i].distances);
+    }
+    VIR_FREE(nodes);
+    VIR_FREE(tmp);
+
+    return ret;
+}
+
 int
 virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
                             xmlXPathContextPtr ctxt)
@@ -694,7 +858,7 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
     xmlNodePtr oldNode = ctxt->node;
     char *tmp = NULL;
     int n;
-    size_t i;
+    size_t i, j;
     int ret = -1;
 
     /* check if NUMA definition is present */
@@ -712,7 +876,6 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
     def->nmem_nodes = n;
 
     for (i = 0; i < n; i++) {
-        size_t j;
         int rc;
         unsigned int cur_cell = i;
 
@@ -788,6 +951,10 @@ virDomainNumaDefCPUParseXML(virDomainNumaPtr def,
             def->mem_nodes[cur_cell].memAccess = rc;
             VIR_FREE(tmp);
         }
+
+        /* Parse NUMA distances info */
+        if (virDomainNumaDefNodeDistanceParseXML(def, ctxt, cur_cell) < 0)
+                goto cleanup;
     }
 
     ret = 0;
@@ -815,6 +982,8 @@ virDomainNumaDefCPUFormatXML(virBufferPtr buf,
     virBufferAddLit(buf, "<numa>\n");
     virBufferAdjustIndent(buf, 2);
     for (i = 0; i < ncells; i++) {
+        int ndistances;
+
         memAccess = virDomainNumaGetNodeMemoryAccessMode(def, i);
 
         if (!(cpustr = virBitmapFormat(virDomainNumaGetNodeCpumask(def, i))))
@@ -829,7 +998,32 @@ virDomainNumaDefCPUFormatXML(virBufferPtr buf,
         if (memAccess)
             virBufferAsprintf(buf, " memAccess='%s'",
                               virDomainMemoryAccessTypeToString(memAccess));
-        virBufferAddLit(buf, "/>\n");
+
+        ndistances = def->mem_nodes[i].ndistances;
+        if (!ndistances) {
+            virBufferAddLit(buf, "/>\n");
+        } else {
+            size_t j;
+            virDomainNumaDistancePtr distances = def->mem_nodes[i].distances;
+
+            virBufferAddLit(buf, ">\n");
+            virBufferAdjustIndent(buf, 2);
+            virBufferAddLit(buf, "<distances>\n");
+            virBufferAdjustIndent(buf, 2);
+            for (j = 0; j < ndistances; j++) {
+                if (distances[j].value) {
+                    virBufferAddLit(buf, "<sibling");
+                    virBufferAsprintf(buf, " id='%d'", distances[j].cellid);
+                    virBufferAsprintf(buf, " value='%d'", distances[j].value);
+                    virBufferAddLit(buf, "/>\n");
+                }
+            }
+            virBufferAdjustIndent(buf, -2);
+            virBufferAddLit(buf, "</distances>\n");
+            virBufferAdjustIndent(buf, -2);
+            virBufferAddLit(buf, "</cell>\n");
+        }
+
         VIR_FREE(cpustr);
     }
     virBufferAdjustIndent(buf, -2);
-- 
2.9.5

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 2/5] numa: describe siblings distances within cells
Posted by Jim Fehlig 7 years, 7 months ago
On 09/08/2017 08:47 AM, Wim Ten Have wrote:
> From: Wim ten Have <wim.ten.have@oracle.com>
> 
> Add libvirtd NUMA cell domain administration functionality to
> describe underlying cell id sibling distances in full fashion
> when configuring HVM guests.

May I suggest wording this paragraph as:

Add support for describing sibling vCPU distances within a domain's vNUMA cell 
configuration.

> Schema updates are made to docs/schemas/cputypes.rng enforcing domain
> administration to follow the syntax below the numa cell id and
> docs/schemas/basictypes.rng to add "numaDistanceValue".

I'm not sure this paragraph is needed in the commit message.

> A minimum value of 10 representing the LOCAL_DISTANCE as 0-9 are
> reserved values and can not be used as System Locality Distance Information.
> A value of 20 represents the default setting of REMOTE_DISTANCE
> where a maximum value of 255 represents UNREACHABLE.
> 
> Effectively any cell sibling can be assigned a distance value where
> practically 'LOCAL_DISTANCE <= value <= UNREACHABLE'.
> 
> [below is an example of a 4 node setup]
> 
>    <cpu>
>      <numa>
>        <cell id='0' cpus='0' memory='2097152' unit='KiB'>
>          <distances>
>            <sibling id='0' value='10'/>
>            <sibling id='1' value='21'/>
>            <sibling id='2' value='31'/>
>            <sibling id='3' value='41'/>
>          </distances>
>        </cell>
>        <cell id='1' cpus='1' memory='2097152' unit='KiB'>
>          <distances>
>            <sibling id='0' value='21'/>
>            <sibling id='1' value='10'/>
>            <sibling id='2' value='31'/>
>            <sibling id='3' value='41'/>
>          </distances>
>        </cell>
>        <cell id='2' cpus='2' memory='2097152' unit='KiB'>
>          <distances>
>            <sibling id='0' value='31'/>
>            <sibling id='1' value='21'/>
>            <sibling id='2' value='10'/>
>            <sibling id='3' value='21'/>
>          </distances>
>        <cell id='3' cpus='3' memory='2097152' unit='KiB'>
>          <distances>
>            <sibling id='0' value='41'/>
>            <sibling id='1' value='31'/>
>            <sibling id='2' value='21'/>
>            <sibling id='3' value='10'/>
>          </distances>
>        </cell>
>      </numa>
>    </cpu>

How would this look when having more than one cpu in a cell? I suppose something 
like

  <cpu>
     <numa>
       <cell id='0' cpus='0-3' memory='2097152' unit='KiB'>
         <distances>
           <sibling id='0' value='10'/>
           <sibling id='1' value='10'/>
           <sibling id='2' value='10'/>
           <sibling id='3' value='10'/>
           <sibling id='4' value='21'/>
           <sibling id='5' value='21'/>
           <sibling id='6' value='21'/>
           <sibling id='7' value='21'/>
         </distances>
       </cell>
       <cell id='1' cpus='4-7' memory='2097152' unit='KiB'>
         <distances>
           <sibling id='0' value='21'/>
           <sibling id='1' value='21'/>
           <sibling id='2' value='21'/>
           <sibling id='3' value='21'/>
           <sibling id='4' value='10'/>
           <sibling id='5' value='10'/>
           <sibling id='6' value='10'/>
           <sibling id='7' value='10'/>
         </distances>
      </cell>
    </numa>
  </cpu>

In the V3 thread you mentioned "And to reduce even more we could also
remove LOCAL_DISTANCES as they make a constant factor where; (cell_id == 
sibling_id)". In the above example cell_id 1 == sibling_id 1, but it is not 
LOCAL_DISTANCE.

> Whenever a sibling id the cell LOCAL_DISTANCE does apply and for any
> sibling id not being covered a default of REMOTE_DISTANCE is used
> for internal computations.

I'm having a hard time understanding this sentence...

I didn't look closely at the patch since I'd like to understand how multi-cpu 
cells are handled before doing so.

Regards,
Jim

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 2/5] numa: describe siblings distances within cells
Posted by Wim ten Have 7 years, 7 months ago
On Fri, 6 Oct 2017 08:49:46 -0600
Jim Fehlig <jfehlig@suse.com> wrote:

> On 09/08/2017 08:47 AM, Wim Ten Have wrote:
> > From: Wim ten Have <wim.ten.have@oracle.com>
> > 
> > Add libvirtd NUMA cell domain administration functionality to
> > describe underlying cell id sibling distances in full fashion
> > when configuring HVM guests.  
> 
> May I suggest wording this paragraph as:
> 
> Add support for describing sibling vCPU distances within a domain's vNUMA cell 
> configuration.

  See below (v5 comment).

> > Schema updates are made to docs/schemas/cputypes.rng enforcing domain
> > administration to follow the syntax below the numa cell id and
> > docs/schemas/basictypes.rng to add "numaDistanceValue".  
> 
> I'm not sure this paragraph is needed in the commit message.
> 
> > A minimum value of 10 representing the LOCAL_DISTANCE as 0-9 are
> > reserved values and can not be used as System Locality Distance Information.
> > A value of 20 represents the default setting of REMOTE_DISTANCE
> > where a maximum value of 255 represents UNREACHABLE.
> > 
> > Effectively any cell sibling can be assigned a distance value where
> > practically 'LOCAL_DISTANCE <= value <= UNREACHABLE'.
> > 
> > [below is an example of a 4 node setup]
> > 
> >    <cpu>
> >      <numa>
> >        <cell id='0' cpus='0' memory='2097152' unit='KiB'>
> >          <distances>
> >            <sibling id='0' value='10'/>
> >            <sibling id='1' value='21'/>
> >            <sibling id='2' value='31'/>
> >            <sibling id='3' value='41'/>
> >          </distances>
> >        </cell>
> >        <cell id='1' cpus='1' memory='2097152' unit='KiB'>
> >          <distances>
> >            <sibling id='0' value='21'/>
> >            <sibling id='1' value='10'/>
> >            <sibling id='2' value='31'/>
> >            <sibling id='3' value='41'/>
> >          </distances>
> >        </cell>
> >        <cell id='2' cpus='2' memory='2097152' unit='KiB'>
> >          <distances>
> >            <sibling id='0' value='31'/>
> >            <sibling id='1' value='21'/>
> >            <sibling id='2' value='10'/>
> >            <sibling id='3' value='21'/>
> >          </distances>
> >        <cell id='3' cpus='3' memory='2097152' unit='KiB'>
> >          <distances>
> >            <sibling id='0' value='41'/>
> >            <sibling id='1' value='31'/>
> >            <sibling id='2' value='21'/>
> >            <sibling id='3' value='10'/>
> >          </distances>
> >        </cell>
> >      </numa>
> >    </cpu>  
> 
> How would this look when having more than one cpu in a cell? I suppose something 
> like
> 
>   <cpu>
>      <numa>
>        <cell id='0' cpus='0-3' memory='2097152' unit='KiB'>
>          <distances>
>            <sibling id='0' value='10'/>
>            <sibling id='1' value='10'/>
>            <sibling id='2' value='10'/>
>            <sibling id='3' value='10'/>
>            <sibling id='4' value='21'/>
>            <sibling id='5' value='21'/>
>            <sibling id='6' value='21'/>
>            <sibling id='7' value='21'/>
>          </distances>
>        </cell>
>        <cell id='1' cpus='4-7' memory='2097152' unit='KiB'>
>          <distances>
>            <sibling id='0' value='21'/>
>            <sibling id='1' value='21'/>
>            <sibling id='2' value='21'/>
>            <sibling id='3' value='21'/>
>            <sibling id='4' value='10'/>
>            <sibling id='5' value='10'/>
>            <sibling id='6' value='10'/>
>            <sibling id='7' value='10'/>
>          </distances>
>       </cell>
>     </numa>
>   </cpu>

  Nope. That machine seems to make a 2 node vNUMA setup. 

  Where;
  * NUMA node(0) defined by <cell id='0'> holds 4 (cores)
    cpus '0-3' with 2GByte of dedicated memory.
  * NUMA node(1) defined by <cell id='1'> holds 4 (cores)
    cpus '4-7' with 2GByte of dedicated memory.

      <cpu>
         <numa>
           <cell id='0' cpus='0-3' memory='2097152' unit='KiB'>
             <distances>
               <sibling id='0' value='10'/>
               <sibling id='1' value='21'/>
             </distances>
           </cell>
           <cell id='1' cpus='4-7' memory='2097152' unit='KiB'>
             <distances>
               <sibling id='0' value='21'/>
               <sibling id='1' value='10'/>
             </distances>
          </cell>
        </numa>
      </cpu>

  Specific configuration would typically report below when examined from
  within the guest domain; (despite ignorance in this example that it
  _DOES_ concern a single socket 8 cpu machine).

      [root@f25 ~]# lscpu
      Architecture:          x86_64
      CPU op-mode(s):        32-bit, 64-bit
      Byte Order:            Little Endian
      CPU(s):                8
      On-line CPU(s) list:   0-7
      Thread(s) per core:    1
      Core(s) per socket:    8
      Socket(s):             1
  *>  NUMA node(s):          2
      Vendor ID:             AuthenticAMD
      CPU family:            21
      Model:                 2
      Model name:            AMD FX-8320E Eight-Core Processor
      Stepping:              0
      CPU MHz:               3210.862
      BogoMIPS:              6421.83
      Virtualization:        AMD-V
      Hypervisor vendor:     Xen
      Virtualization type:   full
      L1d cache:             16K
      L1i cache:             64K
      L2 cache:              2048K
      L3 cache:              8192K
  *>  NUMA node0 CPU(s):     0-3
  *>  NUMA node1 CPU(s):     4-7
      Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c hypervisor lahf_lm svm cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs xop lwp fma4 tbm vmmcall bmi1 arat npt lbrv nrip_save tsc_scale vmcb_clean decodeassists pausefilter

      [root@f25 ~]# numactl -H
      available: 2 nodes (0-1)
      node 0 cpus: 0 1 2 3
      node 0 size: 1990 MB
      node 0 free: 1786 MB
      node 1 cpus: 4 5 6 7
      node 1 size: 1950 MB
      node 1 free: 1820 MB
      node distances:
      node   0   1 
        0:  10  21 
        1:  21  10 

> In the V3 thread you mentioned "And to reduce even more we could also
> remove LOCAL_DISTANCES as they make a constant factor where; (cell_id == 
> sibling_id)". In the above example cell_id 1 == sibling_id 1, but it is not 
> LOCAL_DISTANCE.
> 
> > Whenever a sibling id the cell LOCAL_DISTANCE does apply and for any
> > sibling id not being covered a default of REMOTE_DISTANCE is used
> > for internal computations.  
> 
> I'm having a hard time understanding this sentence...

  Me.2

> I didn't look closely at the patch since I'd like to understand how multi-cpu 
> cells are handled before doing so.

  Let me prepare v5.  I found a silly error in code being fixed and
  given above commented confusion like to take a better approach under
  the commit messages and witin the cover letter.

Regards,
- Wim.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 2/5] numa: describe siblings distances within cells
Posted by Jim Fehlig 7 years, 7 months ago
On 10/12/2017 04:37 AM, Wim ten Have wrote:
> On Fri, 6 Oct 2017 08:49:46 -0600
> Jim Fehlig <jfehlig@suse.com> wrote:
> 
>> On 09/08/2017 08:47 AM, Wim Ten Have wrote:
>>> From: Wim ten Have <wim.ten.have@oracle.com>
>>>
>>> Add libvirtd NUMA cell domain administration functionality to
>>> describe underlying cell id sibling distances in full fashion
>>> when configuring HVM guests.
>>
>> May I suggest wording this paragraph as:
>>
>> Add support for describing sibling vCPU distances within a domain's vNUMA cell
>> configuration.
> 
>    See below (v5 comment).
> 
>>> Schema updates are made to docs/schemas/cputypes.rng enforcing domain
>>> administration to follow the syntax below the numa cell id and
>>> docs/schemas/basictypes.rng to add "numaDistanceValue".
>>
>> I'm not sure this paragraph is needed in the commit message.
>>
>>> A minimum value of 10 representing the LOCAL_DISTANCE as 0-9 are
>>> reserved values and can not be used as System Locality Distance Information.
>>> A value of 20 represents the default setting of REMOTE_DISTANCE
>>> where a maximum value of 255 represents UNREACHABLE.
>>>
>>> Effectively any cell sibling can be assigned a distance value where
>>> practically 'LOCAL_DISTANCE <= value <= UNREACHABLE'.
>>>
>>> [below is an example of a 4 node setup]
>>>
>>>     <cpu>
>>>       <numa>
>>>         <cell id='0' cpus='0' memory='2097152' unit='KiB'>
>>>           <distances>
>>>             <sibling id='0' value='10'/>
>>>             <sibling id='1' value='21'/>
>>>             <sibling id='2' value='31'/>
>>>             <sibling id='3' value='41'/>
>>>           </distances>
>>>         </cell>
>>>         <cell id='1' cpus='1' memory='2097152' unit='KiB'>
>>>           <distances>
>>>             <sibling id='0' value='21'/>
>>>             <sibling id='1' value='10'/>
>>>             <sibling id='2' value='31'/>
>>>             <sibling id='3' value='41'/>
>>>           </distances>
>>>         </cell>
>>>         <cell id='2' cpus='2' memory='2097152' unit='KiB'>
>>>           <distances>
>>>             <sibling id='0' value='31'/>
>>>             <sibling id='1' value='21'/>
>>>             <sibling id='2' value='10'/>
>>>             <sibling id='3' value='21'/>
>>>           </distances>
>>>         <cell id='3' cpus='3' memory='2097152' unit='KiB'>
>>>           <distances>
>>>             <sibling id='0' value='41'/>
>>>             <sibling id='1' value='31'/>
>>>             <sibling id='2' value='21'/>
>>>             <sibling id='3' value='10'/>
>>>           </distances>
>>>         </cell>
>>>       </numa>
>>>     </cpu>
>>
>> How would this look when having more than one cpu in a cell? I suppose something
>> like
>>
>>    <cpu>
>>       <numa>
>>         <cell id='0' cpus='0-3' memory='2097152' unit='KiB'>
>>           <distances>
>>             <sibling id='0' value='10'/>
>>             <sibling id='1' value='10'/>
>>             <sibling id='2' value='10'/>
>>             <sibling id='3' value='10'/>
>>             <sibling id='4' value='21'/>
>>             <sibling id='5' value='21'/>
>>             <sibling id='6' value='21'/>
>>             <sibling id='7' value='21'/>
>>           </distances>
>>         </cell>
>>         <cell id='1' cpus='4-7' memory='2097152' unit='KiB'>
>>           <distances>
>>             <sibling id='0' value='21'/>
>>             <sibling id='1' value='21'/>
>>             <sibling id='2' value='21'/>
>>             <sibling id='3' value='21'/>
>>             <sibling id='4' value='10'/>
>>             <sibling id='5' value='10'/>
>>             <sibling id='6' value='10'/>
>>             <sibling id='7' value='10'/>
>>           </distances>
>>        </cell>
>>      </numa>
>>    </cpu>
> 
>    Nope. That machine seems to make a 2 node vNUMA setup.
> 
>    Where;
>    * NUMA node(0) defined by <cell id='0'> holds 4 (cores)
>      cpus '0-3' with 2GByte of dedicated memory.
>    * NUMA node(1) defined by <cell id='1'> holds 4 (cores)
>      cpus '4-7' with 2GByte of dedicated memory.

Correct.

>        <cpu>
>           <numa>
>             <cell id='0' cpus='0-3' memory='2097152' unit='KiB'>
>               <distances>
>                 <sibling id='0' value='10'/>
>                 <sibling id='1' value='21'/>
>               </distances>
>             </cell>
>             <cell id='1' cpus='4-7' memory='2097152' unit='KiB'>
>               <distances>
>                 <sibling id='0' value='21'/>
>                 <sibling id='1' value='10'/>
>               </distances>
>            </cell>
>          </numa>
>        </cpu>

Duh. sibling id='x' refers to cell with id 'x'. For some reason I had it stuck 
in my head that it referred to vcpu with id 'x'.

> 
>    Specific configuration would typically report below when examined from
>    within the guest domain; (despite ignorance in this example that it
>    _DOES_ concern a single socket 8 cpu machine).
> 
>        [root@f25 ~]# lscpu
>        Architecture:          x86_64
>        CPU op-mode(s):        32-bit, 64-bit
>        Byte Order:            Little Endian
>        CPU(s):                8
>        On-line CPU(s) list:   0-7
>        Thread(s) per core:    1
>        Core(s) per socket:    8
>        Socket(s):             1
>    *>  NUMA node(s):          2
>        Vendor ID:             AuthenticAMD
>        CPU family:            21
>        Model:                 2
>        Model name:            AMD FX-8320E Eight-Core Processor
>        Stepping:              0
>        CPU MHz:               3210.862
>        BogoMIPS:              6421.83
>        Virtualization:        AMD-V
>        Hypervisor vendor:     Xen
>        Virtualization type:   full
>        L1d cache:             16K
>        L1i cache:             64K
>        L2 cache:              2048K
>        L3 cache:              8192K
>    *>  NUMA node0 CPU(s):     0-3
>    *>  NUMA node1 CPU(s):     4-7
>        Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c hypervisor lahf_lm svm cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs xop lwp fma4 tbm vmmcall bmi1 arat npt lbrv nrip_save tsc_scale vmcb_clean decodeassists pausefilter
> 
>        [root@f25 ~]# numactl -H
>        available: 2 nodes (0-1)
>        node 0 cpus: 0 1 2 3
>        node 0 size: 1990 MB
>        node 0 free: 1786 MB
>        node 1 cpus: 4 5 6 7
>        node 1 size: 1950 MB
>        node 1 free: 1820 MB
>        node distances:
>        node   0   1
>          0:  10  21
>          1:  21  10

Right, got it.

> 
>> In the V3 thread you mentioned "And to reduce even more we could also
>> remove LOCAL_DISTANCES as they make a constant factor where; (cell_id ==
>> sibling_id)". In the above example cell_id 1 == sibling_id 1, but it is not
>> LOCAL_DISTANCE.
>>
>>> Whenever a sibling id the cell LOCAL_DISTANCE does apply and for any
>>> sibling id not being covered a default of REMOTE_DISTANCE is used
>>> for internal computations.
>>
>> I'm having a hard time understanding this sentence...
> 
>    Me.2
> 
>> I didn't look closely at the patch since I'd like to understand how multi-cpu
>> cells are handled before doing so.
> 
>    Let me prepare v5.  I found a silly error in code being fixed and
>    given above commented confusion like to take a better approach under
>    the commit messages and witin the cover letter.

Thanks. Hopefully I'll have time to review it without much delay.

Regards,
Jim

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list