src/conf/domain_conf.c | 86 +++++++++- src/util/virresctrl.c | 422 ++++++++++++++++++++++++++++++++++++++++++++++--- src/util/virresctrl.h | 5 + 3 files changed, 487 insertions(+), 26 deletions(-)
From: Bing Niu <bing.niu@intel.com> This series is to introduce RDT memory bandwidth allocation support by extending current virresctrl implementation. The Memory Bandwidth Allocation (MBA) feature provides indirect and approximate control over memory bandwidth available per-core. This feature provides a method to control applications which may be over-utilizing bandwidth relative to their priority in environments such as the data-center. The details can be found in Intel's SDM 17.19.7. Kernel supports MBA through resctrl file system same as CAT. Each resctrl group have a MB parameter to control how much memory bandwidth it can utilize in unit of percentage. In this series, MBA is enabled by enhancing existing virresctrl implementation. The policy employed for MBA is similar with CAT: The sum of each MBA group's bandwidth dose not exceed 100%. The enhancement of virresctrl include two parts: Patch 1: Add two new structure virResctrlInfoMB and virResctrlAllocMB for collecting host system MBA capability and domain memory bandwidth allocation. Patch 2: On frontend XML parsing, add new element "llc" in cachetune section for MBA allocation. Bing Niu (2): util: Add memory bandwidth support to resctrl conf: Extend cputune/cachetune to support memory bandwidth allocation src/conf/domain_conf.c | 86 +++++++++- src/util/virresctrl.c | 422 ++++++++++++++++++++++++++++++++++++++++++++++--- src/util/virresctrl.h | 5 + 3 files changed, 487 insertions(+), 26 deletions(-) -- 2.7.4 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
ping for this series..... thanks a lot bing On 2018年05月29日 18:58, bing.niu@intel.com wrote: > From: Bing Niu <bing.niu@intel.com> > > This series is to introduce RDT memory bandwidth allocation support by extending > current virresctrl implementation. > > The Memory Bandwidth Allocation (MBA) feature provides indirect and approximate > control over memory bandwidth available per-core. This feature provides a method to > control applications which may be over-utilizing bandwidth relative to their priority > in environments such as the data-center. The details can be found in Intel's SDM 17.19.7. > Kernel supports MBA through resctrl file system same as CAT. Each resctrl group have a > MB parameter to control how much memory bandwidth it can utilize in unit of percentage. > > In this series, MBA is enabled by enhancing existing virresctrl implementation. The > policy employed for MBA is similar with CAT: The sum of each MBA group's bandwidth > dose not exceed 100%. The enhancement of virresctrl include two parts: > > Patch 1: Add two new structure virResctrlInfoMB and virResctrlAllocMB for collecting > host system MBA capability and domain memory bandwidth allocation. > > Patch 2: On frontend XML parsing, add new element "llc" in cachetune section for > MBA allocation. > > Bing Niu (2): > util: Add memory bandwidth support to resctrl > conf: Extend cputune/cachetune to support memory bandwidth allocation > > src/conf/domain_conf.c | 86 +++++++++- > src/util/virresctrl.c | 422 ++++++++++++++++++++++++++++++++++++++++++++++--- > src/util/virresctrl.h | 5 + > 3 files changed, 487 insertions(+), 26 deletions(-) > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Tue, May 29, 2018 at 06:58:01PM +0800, bing.niu@intel.com wrote: > From: Bing Niu <bing.niu@intel.com> > > This series is to introduce RDT memory bandwidth allocation support by extending > current virresctrl implementation. > > The Memory Bandwidth Allocation (MBA) feature provides indirect and approximate > control over memory bandwidth available per-core. This feature provides a method to > control applications which may be over-utilizing bandwidth relative to their priority > in environments such as the data-center. The details can be found in Intel's SDM 17.19.7. > Kernel supports MBA through resctrl file system same as CAT. Each resctrl group have a > MB parameter to control how much memory bandwidth it can utilize in unit of percentage. > > In this series, MBA is enabled by enhancing existing virresctrl implementation. The > policy employed for MBA is similar with CAT: The sum of each MBA group's bandwidth > dose not exceed 100%. The enhancement of virresctrl include two parts: > > Patch 1: Add two new structure virResctrlInfoMB and virResctrlAllocMB for collecting > host system MBA capability and domain memory bandwidth allocation. > > Patch 2: On frontend XML parsing, add new element "llc" in cachetune section for > MBA allocation. Hi, Thanks for the patches. Before we start with the actual implementation it would be nice to agree on the design. ------------------------------------------------------------------------ So first point is that we should do it similarly as the cache allocation, we will not allow to "share" the bandwidth so the sum should be 100% as you already have that in your patches, but we need to do it in a way that in the future we can allow to "share" the bandwidth. Second point is how the XML will look like. There are two parts, one is the capabilities XML and second one is domain XML. It looks like that your patches don't expose any information in capabilities, we should do that in order to let management applications know that the feature is available and what are the possible values that they can use. ------------------------------------------------------------------------ I've tried to configure MBA on one machine that I have access to witch has this cpu: 'Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz' and it behaves strangely. If I configure 'schemata' the output of 'pqos -s' command is in some situations different: schemata pqos -s output MB:0=10 MBA COS0 => 10% available MB:0=20 MBA COS0 => 20% available MB:0=30 MBA COS0 => 30% available MB:0=40 MBA COS0 => 40% available MB:0=50 MBA COS0 => 50% available MB:0=60 MBA COS0 => 60% available MB:0=70 MBA COS0 => 90% available MB:0=80 MBA COS0 => 90% available MB:0=90 MBA COS0 => 90% available MB:0=100 MBA COS0 => 100% available If you look at the table you can see that for values 70-90 the pqos shows that the available bandwidth is 90%. Tested using Fedora 28: kernel-4.16.13-300.fc28.x86_64 intel-cmt-cat-1.2.0-2.fc28.x86_64 ------------------------------------------------------------------------ Since CAT (cache allocation technology) and MBA (memory bandwidth allocation) are unrelated and they are controlling different limitation we should not group MBA together with CAT in our XML files. From poor documentation it looks like that MBA is related to memory controller. Currently the cache allocation in capabilities XML is reported like this: <capabilities> <host> ... <cache> <bank id='0' level='3' type='both' size='30720' unit='KiB' cpus='0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46'> <control granularity='1536' unit='KiB' type='code' maxAllocs='8'/> <control granularity='1536' unit='KiB' type='data' maxAllocs='8'/> </bank> <bank id='1' level='3' type='both' size='30720' unit='KiB' cpus='1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47'> <control granularity='1536' unit='KiB' type='code' maxAllocs='8'/> <control granularity='1536' unit='KiB' type='data' maxAllocs='8'/> </bank> </cache> ... </host> </capabilities> So the possible capabilities XML could look like this: <capabilities> <host> ... <memory> <bank id='0' cpus='0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46'> <control granularity='10' maxAllocs='8'/> </bank> <bank id='1' cpus='1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47'> <control granularity='10' maxAllocs='8'/> </bank> </memory> ... </host> </capabilities> The element names 'memory' and 'bank' can be named differently, suggestions are welcome. Then there is the domain XML, for CAT we use this: <domain> ... <cputune> ... <cachetune vcpus='0-3'> <cache id='0' level='3' type='both' size='3' unit='MiB'/> <cache id='1' level='3' type='both' size='3' unit='MiB'/> </cachetune> ... <cputune> ... </domain> so the possible domain XML could look like this: <domain> ... <cputune> ... <memory vcpus='0-3'> <socket id='0' bandwidth='30'/> <socket id='1' bandwidth='20'/> </memory> ... <cputune> ... </domain> Again, the element names 'memory' and 'socket' can be named differently. Pavel -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Hi Pavel, Thanks for your valuable inputs here. please see my respond. On 2018年06月05日 20:11, Pavel Hrdina wrote: > On Tue, May 29, 2018 at 06:58:01PM +0800, bing.niu@intel.com wrote: >> From: Bing Niu <bing.niu@intel.com> >> >> This series is to introduce RDT memory bandwidth allocation support by extending >> current virresctrl implementation. >> >> The Memory Bandwidth Allocation (MBA) feature provides indirect and approximate >> control over memory bandwidth available per-core. This feature provides a method to >> control applications which may be over-utilizing bandwidth relative to their priority >> in environments such as the data-center. The details can be found in Intel's SDM 17.19.7. >> Kernel supports MBA through resctrl file system same as CAT. Each resctrl group have a >> MB parameter to control how much memory bandwidth it can utilize in unit of percentage. >> >> In this series, MBA is enabled by enhancing existing virresctrl implementation. The >> policy employed for MBA is similar with CAT: The sum of each MBA group's bandwidth >> dose not exceed 100%. The enhancement of virresctrl include two parts: >> >> Patch 1: Add two new structure virResctrlInfoMB and virResctrlAllocMB for collecting >> host system MBA capability and domain memory bandwidth allocation. >> >> Patch 2: On frontend XML parsing, add new element "llc" in cachetune section for >> MBA allocation. > > Hi, > > Thanks for the patches. Before we start with the actual implementation > it would be nice to agree on the design. Total agree. The RFC code acts as baseline for discuss. > > ------------------------------------------------------------------------ > > So first point is that we should do it similarly as the cache > allocation, we will not allow to "share" the bandwidth so the sum should > be 100% as you already have that in your patches, but we need to do it > in a way that in the future we can allow to "share" the bandwidth. >Yes, the memory bandwidth allocation policy is derived from existing CAT in libvirt. no share or overlap. In the patch, I follow the existing CAT behavior. When allocating memory bandwidth. First, calculate the unused memory bandwidth by subtracting all existing RDT groups. If we want to enable memory bandwidth sharing. We can just simply skip this part and do allocation directly. Could this fit your comment " we need to do it in a way that in the future we can allow to "share" the bandwidth."? If there is anything missing or my understanding incorrect, Please point me out. :) > Second point is how the XML will look like. There are two parts, one is > the capabilities XML and second one is domain XML. > > It looks like that your patches don't expose any information in > capabilities, we should do that in order to let management applications > know that the feature is available and what are the possible values that > they can use. > > ------------------------------------------------------------------------ > > I've tried to configure MBA on one machine that I have access to witch > has this cpu: 'Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz' and it behaves > strangely. If I configure 'schemata' the output of 'pqos -s' command is > in some situations different: > > schemata pqos -s output > > MB:0=10 MBA COS0 => 10% available > MB:0=20 MBA COS0 => 20% available > MB:0=30 MBA COS0 => 30% available > MB:0=40 MBA COS0 => 40% available > MB:0=50 MBA COS0 => 50% available > MB:0=60 MBA COS0 => 60% available > MB:0=70 MBA COS0 => 90% available > MB:0=80 MBA COS0 => 90% available > MB:0=90 MBA COS0 => 90% available > MB:0=100 MBA COS0 => 100% available > > If you look at the table you can see that for values 70-90 the pqos > shows that the available bandwidth is 90%. > > Tested using Fedora 28: > kernel-4.16.13-300.fc28.x86_64 > intel-cmt-cat-1.2.0-2.fc28.x86_64 > hmm.., that is strange. I directly manipulate resctrl fs. So I didn't hit such kind of issue. I will take a look at this pqos package and let you know. > ------------------------------------------------------------------------ > > Since CAT (cache allocation technology) and MBA (memory bandwidth > allocation) are unrelated and they are controlling different limitation > we should not group MBA together with CAT in our XML files. From poor > documentation it looks like that MBA is related to memory controller. From Intel sdm 17.19. MBA used to control the request rate for flushing data from llc to memory, usually MBA and llc have a 1:1 mapping relation. Yes, I miss exposing capability part. Thanks for pointing out. > > Currently the cache allocation in capabilities XML is reported like > this: > > <capabilities> > <host> > ... > <cache> > <bank id='0' level='3' type='both' size='30720' unit='KiB' cpus='0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46'> > <control granularity='1536' unit='KiB' type='code' maxAllocs='8'/> > <control granularity='1536' unit='KiB' type='data' maxAllocs='8'/> > </bank> > <bank id='1' level='3' type='both' size='30720' unit='KiB' cpus='1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47'> > <control granularity='1536' unit='KiB' type='code' maxAllocs='8'/> > <control granularity='1536' unit='KiB' type='data' maxAllocs='8'/> > </bank> > </cache> > ... > </host> > </capabilities> > > So the possible capabilities XML could look like this: > > <capabilities> > <host> > ... > <memory> > <bank id='0' cpus='0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46'> > <control granularity='10' maxAllocs='8'/> > </bank> > <bank id='1' cpus='1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47'> > <control granularity='10' maxAllocs='8'/> > </bank> > </memory> > ... > </host> > </capabilities> > > The element names 'memory' and 'bank' can be named differently, > suggestions are welcome. How about change bank to node? > > > > > Then there is the domain XML, for CAT we use this: > > <domain> > ... > <cputune> > ... > <cachetune vcpus='0-3'> > <cache id='0' level='3' type='both' size='3' unit='MiB'/> > <cache id='1' level='3' type='both' size='3' unit='MiB'/> > </cachetune> > ... > <cputune> > ... > </domain> > > so the possible domain XML could look like this: > > <domain> > ... > <cputune> > ... > <memory vcpus='0-3'> > <socket id='0' bandwidth='30'/> > <socket id='1' bandwidth='20'/> > </memory> > ... > <cputune> > ... > </domain> > > Again, the element names 'memory' and 'socket' can be named differently. socket --> node? Since the existing virrestrl implementation only care about cache part during development, So we may need change some names of structure and functions when enable MBA. How do you think > > Pavel > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Hi Pavel, On 2018年06月06日 13:56, bing.niu wrote: >> >> Then there is the domain XML, for CAT we use this: >> >> <domain> >> ... >> <cputune> >> ... >> <cachetune vcpus='0-3'> >> <cache id='0' level='3' type='both' size='3' unit='MiB'/> >> <cache id='1' level='3' type='both' size='3' unit='MiB'/> >> </cachetune> >> ... >> <cputune> >> ... >> </domain> >> >> so the possible domain XML could look like this: >> >> <domain> >> ... >> <cputune> >> ... >> <memory vcpus='0-3'> >> <socket id='0' bandwidth='30'/> >> <socket id='1' bandwidth='20'/> >> </memory> >> ... >> <cputune> >> ... >> </domain> >> >> Again, the element names 'memory' and 'socket' can be named differently. > socket --> node? > > Since the existing virrestrl implementation only care about cache part > during development, So we may need change some names of structure and > functions when enable MBA. How do you think >> >> Pavel >> > Is that possible to support MBA by extending CAT in domain XML? Since each <cachetune> will map to one virresctrlalloc structure and create a rdt_group in resctrl fs. Each rdt_group will have it's own closid. this work perfect if CAT only available. However, if MBA coming in with CAT enabled also, ike this. <domain> ... <cputune> ... <cachetune vcpus='0-3'> <cache id='0' level='3' type='both' size='3' unit='MiB'/> <cache id='1' level='3' type='both' size='3' unit='MiB'/> </cachetune> <memory vcpus='2-3'> <socket id='0' bandwidth='30'/> <socket id='1' bandwidth='20'/> </memory> ... <cputune> ... </domain> we have to make sure those two allocating will not have vcpu overlapped. like this, if (virBitmapOverlaps(def->cachetunes[i]->vcpus, vcpus) || virBitmapOverlaps(def->memroy->vcpus, vcpus)) { virReportError(VIR_ERR_XML_ERROR, "%s", _("Overlapping vcpus in cachetunes")); goto cleanup; that looks like introducing some dependency between CAT and MBA. Is that possible we rename cachetune so that handle CAT MBA together one section? Thanks a lot Bing > -- > libvir-list mailing list > libvir-list@redhat.com > https://www.redhat.com/mailman/listinfo/libvir-list -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Wed, Jun 06, 2018 at 03:46:37PM +0800, bing.niu wrote: > Hi Pavel, > > On 2018年06月06日 13:56, bing.niu wrote: > > > > > > > Then there is the domain XML, for CAT we use this: > > > > > > <domain> > > > ... > > > <cputune> > > > ... > > > <cachetune vcpus='0-3'> > > > <cache id='0' level='3' type='both' size='3' unit='MiB'/> > > > <cache id='1' level='3' type='both' size='3' unit='MiB'/> > > > </cachetune> > > > ... > > > <cputune> > > > ... > > > </domain> > > > > > > so the possible domain XML could look like this: > > > > > > <domain> > > > ... > > > <cputune> > > > ... > > > <memory vcpus='0-3'> > > > <socket id='0' bandwidth='30'/> > > > <socket id='1' bandwidth='20'/> > > > </memory> > > > ... > > > <cputune> > > > ... > > > </domain> > > > > > > Again, the element names 'memory' and 'socket' can be named differently. > > socket --> node? > > > > Since the existing virrestrl implementation only care about cache part > > during development, So we may need change some names of structure and > > functions when enable MBA. How do you think > > > > > > Pavel > > > > > > Is that possible to support MBA by extending CAT in domain XML? Since each > <cachetune> will map to one virresctrlalloc structure and create a rdt_group > in resctrl fs. Each rdt_group will have it's own closid. this work perfect > if CAT only available. However, if MBA coming in with CAT enabled also, ike > this. > > <domain> > ... > <cputune> > ... > <cachetune vcpus='0-3'> > <cache id='0' level='3' type='both' size='3' unit='MiB'/> > <cache id='1' level='3' type='both' size='3' unit='MiB'/> > </cachetune> > <memory vcpus='2-3'> > <socket id='0' bandwidth='30'/> > <socket id='1' bandwidth='20'/> > </memory> > ... > <cputune> > ... > </domain> > > we have to make sure those two allocating will not have vcpu overlapped. > like this, > > if (virBitmapOverlaps(def->cachetunes[i]->vcpus, vcpus) > || virBitmapOverlaps(def->memroy->vcpus, vcpus)) { > virReportError(VIR_ERR_XML_ERROR, "%s", > _("Overlapping vcpus in cachetunes")); > goto cleanup; > that looks like introducing some dependency between CAT and MBA. > Is that possible we rename cachetune so that handle CAT MBA together one > section? I would like to avoid mixing cache tune and memory bandwidth under 'cachetune' element. But this is a good point, we need to make sure that the 'vcpus' is not overlapping. Renaming existing XML element is not possible, it needs to be backward compatible. Pavel -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 2018年06月08日 21:00, Pavel Hrdina wrote: > On Wed, Jun 06, 2018 at 03:46:37PM +0800, bing.niu wrote: >> Hi Pavel, >> >> On 2018年06月06日 13:56, bing.niu wrote: >> >>>> >>>> Then there is the domain XML, for CAT we use this: >>>> >>>> <domain> >>>> ... >>>> <cputune> >>>> ... >>>> <cachetune vcpus='0-3'> >>>> <cache id='0' level='3' type='both' size='3' unit='MiB'/> >>>> <cache id='1' level='3' type='both' size='3' unit='MiB'/> >>>> </cachetune> >>>> ... >>>> <cputune> >>>> ... >>>> </domain> >>>> >>>> so the possible domain XML could look like this: >>>> >>>> <domain> >>>> ... >>>> <cputune> >>>> ... >>>> <memory vcpus='0-3'> >>>> <socket id='0' bandwidth='30'/> >>>> <socket id='1' bandwidth='20'/> >>>> </memory> >>>> ... >>>> <cputune> >>>> ... >>>> </domain> >>>> >>>> Again, the element names 'memory' and 'socket' can be named differently. >>> socket --> node? >>> >>> Since the existing virrestrl implementation only care about cache part >>> during development, So we may need change some names of structure and >>> functions when enable MBA. How do you think >>>> >>>> Pavel >>>> >>> >> Is that possible to support MBA by extending CAT in domain XML? Since each >> <cachetune> will map to one virresctrlalloc structure and create a rdt_group >> in resctrl fs. Each rdt_group will have it's own closid. this work perfect >> if CAT only available. However, if MBA coming in with CAT enabled also, ike >> this. >> >> <domain> >> ... >> <cputune> >> ... >> <cachetune vcpus='0-3'> >> <cache id='0' level='3' type='both' size='3' unit='MiB'/> >> <cache id='1' level='3' type='both' size='3' unit='MiB'/> >> </cachetune> >> <memory vcpus='2-3'> >> <socket id='0' bandwidth='30'/> >> <socket id='1' bandwidth='20'/> >> </memory> >> ... >> <cputune> >> ... >> </domain> >> >> we have to make sure those two allocating will not have vcpu overlapped. >> like this, >> >> if (virBitmapOverlaps(def->cachetunes[i]->vcpus, vcpus) >> || virBitmapOverlaps(def->memroy->vcpus, vcpus)) { >> virReportError(VIR_ERR_XML_ERROR, "%s", >> _("Overlapping vcpus in cachetunes")); >> goto cleanup; >> that looks like introducing some dependency between CAT and MBA. >> Is that possible we rename cachetune so that handle CAT MBA together one >> section? > > I would like to avoid mixing cache tune and memory bandwidth under > 'cachetune' element. But this is a good point, we need to make sure > that the 'vcpus' is not overlapping. > > Renaming existing XML element is not possible, it needs to be backward > compatible. > > Pavel thanks for clarification ;). we will do a cross check about cachetune and memory to make sure no vcpus overlapping. I just feel this may confuse people. Since we must tell them vcpus cannot overlapping for cachetune and memory, but vcpus can be equal. this should be the most common case--- create one rdt group for one vm. > > > > -- > libvir-list mailing list > libvir-list@redhat.com > https://www.redhat.com/mailman/listinfo/libvir-list > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Sat, Jun 09, 2018 at 03:13:56PM +0800, bing.niu wrote: > > > On 2018年06月08日 21:00, Pavel Hrdina wrote: > > On Wed, Jun 06, 2018 at 03:46:37PM +0800, bing.niu wrote: > > > Hi Pavel, > > > > > > On 2018年06月06日 13:56, bing.niu wrote: > > > > > > > > > > > > > Then there is the domain XML, for CAT we use this: > > > > > > > > > > <domain> > > > > > ... > > > > > <cputune> > > > > > ... > > > > > <cachetune vcpus='0-3'> > > > > > <cache id='0' level='3' type='both' size='3' unit='MiB'/> > > > > > <cache id='1' level='3' type='both' size='3' unit='MiB'/> > > > > > </cachetune> > > > > > ... > > > > > <cputune> > > > > > ... > > > > > </domain> > > > > > > > > > > so the possible domain XML could look like this: > > > > > > > > > > <domain> > > > > > ... > > > > > <cputune> > > > > > ... > > > > > <memory vcpus='0-3'> > > > > > <socket id='0' bandwidth='30'/> > > > > > <socket id='1' bandwidth='20'/> > > > > > </memory> > > > > > ... > > > > > <cputune> > > > > > ... > > > > > </domain> > > > > > > > > > > Again, the element names 'memory' and 'socket' can be named differently. > > > > socket --> node? > > > > > > > > Since the existing virrestrl implementation only care about cache part > > > > during development, So we may need change some names of structure and > > > > functions when enable MBA. How do you think > > > > > > > > > > Pavel > > > > > > > > > > > > Is that possible to support MBA by extending CAT in domain XML? Since each > > > <cachetune> will map to one virresctrlalloc structure and create a rdt_group > > > in resctrl fs. Each rdt_group will have it's own closid. this work perfect > > > if CAT only available. However, if MBA coming in with CAT enabled also, ike > > > this. > > > > > > <domain> > > > ... > > > <cputune> > > > ... > > > <cachetune vcpus='0-3'> > > > <cache id='0' level='3' type='both' size='3' unit='MiB'/> > > > <cache id='1' level='3' type='both' size='3' unit='MiB'/> > > > </cachetune> > > > <memory vcpus='2-3'> > > > <socket id='0' bandwidth='30'/> > > > <socket id='1' bandwidth='20'/> > > > </memory> > > > ... > > > <cputune> > > > ... > > > </domain> > > > > > > we have to make sure those two allocating will not have vcpu overlapped. > > > like this, > > > > > > if (virBitmapOverlaps(def->cachetunes[i]->vcpus, vcpus) > > > || virBitmapOverlaps(def->memroy->vcpus, vcpus)) { > > > virReportError(VIR_ERR_XML_ERROR, "%s", > > > _("Overlapping vcpus in cachetunes")); > > > goto cleanup; > > > that looks like introducing some dependency between CAT and MBA. > > > Is that possible we rename cachetune so that handle CAT MBA together one > > > section? > > > > I would like to avoid mixing cache tune and memory bandwidth under > > 'cachetune' element. But this is a good point, we need to make sure > > that the 'vcpus' is not overlapping. > > > > Renaming existing XML element is not possible, it needs to be backward > > compatible. > > > > Pavel > thanks for clarification ;). we will do a cross check about cachetune and > memory to make sure no vcpus overlapping. I just feel this may confuse > people. Since we must tell them vcpus cannot overlapping for cachetune and > memory, but vcpus can be equal. this should be the most common case--- > create one rdt group for one vm. Yes, I don't like that confusion as well. We will see how the code will look like and how it will behave whether it makes sense or not. If needed we can change it. Pavel -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Wed, Jun 06, 2018 at 01:56:53PM +0800, bing.niu wrote: > Hi Pavel, > Thanks for your valuable inputs here. please see my respond. > > On 2018年06月05日 20:11, Pavel Hrdina wrote: > > On Tue, May 29, 2018 at 06:58:01PM +0800, bing.niu@intel.com wrote: > > > From: Bing Niu <bing.niu@intel.com> > > > > > > This series is to introduce RDT memory bandwidth allocation support by extending > > > current virresctrl implementation. > > > > > > The Memory Bandwidth Allocation (MBA) feature provides indirect and approximate > > > control over memory bandwidth available per-core. This feature provides a method to > > > control applications which may be over-utilizing bandwidth relative to their priority > > > in environments such as the data-center. The details can be found in Intel's SDM 17.19.7. > > > Kernel supports MBA through resctrl file system same as CAT. Each resctrl group have a > > > MB parameter to control how much memory bandwidth it can utilize in unit of percentage. > > > > > > In this series, MBA is enabled by enhancing existing virresctrl implementation. The > > > policy employed for MBA is similar with CAT: The sum of each MBA group's bandwidth > > > dose not exceed 100%. The enhancement of virresctrl include two parts: > > > > > > Patch 1: Add two new structure virResctrlInfoMB and virResctrlAllocMB for collecting > > > host system MBA capability and domain memory bandwidth allocation. > > > > > > Patch 2: On frontend XML parsing, add new element "llc" in cachetune section for > > > MBA allocation. > > > > Hi, > > > > Thanks for the patches. Before we start with the actual implementation > > it would be nice to agree on the design. > Total agree. The RFC code acts as baseline for discuss. > > > > ------------------------------------------------------------------------ > > > > So first point is that we should do it similarly as the cache > > allocation, we will not allow to "share" the bandwidth so the sum should > > be 100% as you already have that in your patches, but we need to do it > > in a way that in the future we can allow to "share" the bandwidth. > > Yes, the memory bandwidth allocation policy is derived from existing CAT > in libvirt. no share or overlap. In the patch, I follow the existing CAT > behavior. When allocating memory bandwidth. First, calculate the unused > memory bandwidth by subtracting all existing RDT groups. If we want to > enable memory bandwidth sharing. We can just simply skip this part and do > allocation directly. > Could this fit your comment " we need to do it in a way that in the future > we can allow to "share" the bandwidth."? > If there is anything missing or my understanding incorrect, Please point me > out. :) Sounds good to me. > > Second point is how the XML will look like. There are two parts, one is > > the capabilities XML and second one is domain XML. > > > > It looks like that your patches don't expose any information in > > capabilities, we should do that in order to let management applications > > know that the feature is available and what are the possible values that > > they can use. > > > > ------------------------------------------------------------------------ > > > > I've tried to configure MBA on one machine that I have access to witch > > has this cpu: 'Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz' and it behaves > > strangely. If I configure 'schemata' the output of 'pqos -s' command is > > in some situations different: > > > > schemata pqos -s output > > > > MB:0=10 MBA COS0 => 10% available > > MB:0=20 MBA COS0 => 20% available > > MB:0=30 MBA COS0 => 30% available > > MB:0=40 MBA COS0 => 40% available > > MB:0=50 MBA COS0 => 50% available > > MB:0=60 MBA COS0 => 60% available > > MB:0=70 MBA COS0 => 90% available > > MB:0=80 MBA COS0 => 90% available > > MB:0=90 MBA COS0 => 90% available > > MB:0=100 MBA COS0 => 100% available > > If you look at the table you can see that for values 70-90 the pqos > > shows that the available bandwidth is 90%. > > > > Tested using Fedora 28: > > kernel-4.16.13-300.fc28.x86_64 > > intel-cmt-cat-1.2.0-2.fc28.x86_64 > > > hmm.., that is strange. I directly manipulate resctrl fs. So I didn't hit > such kind of issue. I will take a look at this pqos package and let you > know. Yes, I was directly manipulating the resctrl fs as well and after the modification to schemata file the content was correct, however, I wanted to validate what was actually configured and in order to do that I used 'pqos' tool. The question now is whether that tool is broken or if you configure some values the actual configuration is different. > > ------------------------------------------------------------------------ > > > > Since CAT (cache allocation technology) and MBA (memory bandwidth > > allocation) are unrelated and they are controlling different limitation > > we should not group MBA together with CAT in our XML files. From poor > > documentation it looks like that MBA is related to memory controller. > From Intel sdm 17.19. MBA used to control the request rate for flushing data > from llc to memory, usually MBA and llc have a 1:1 mapping relation. Yes, I > miss exposing capability part. Thanks for pointing out. > > > > Currently the cache allocation in capabilities XML is reported like > > this: > > > > <capabilities> > > <host> > > ... > > <cache> > > <bank id='0' level='3' type='both' size='30720' unit='KiB' cpus='0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46'> > > <control granularity='1536' unit='KiB' type='code' maxAllocs='8'/> > > <control granularity='1536' unit='KiB' type='data' maxAllocs='8'/> > > </bank> > > <bank id='1' level='3' type='both' size='30720' unit='KiB' cpus='1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47'> > > <control granularity='1536' unit='KiB' type='code' maxAllocs='8'/> > > <control granularity='1536' unit='KiB' type='data' maxAllocs='8'/> > > </bank> > > </cache> > > ... > > </host> > > </capabilities> > > > > So the possible capabilities XML could look like this: > > > > <capabilities> > > <host> > > ... > > <memory> > > <bank id='0' cpus='0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46'> > > <control granularity='10' maxAllocs='8'/> > > </bank> > > <bank id='1' cpus='1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47'> > > <control granularity='10' maxAllocs='8'/> > > </bank> > > </memory> > > ... > > </host> > > </capabilities> > > > > The element names 'memory' and 'bank' can be named differently, > > suggestions are welcome. > How about change bank to node? Definitely can be 'node', this was just to make it consistent with cache allocation. > > Then there is the domain XML, for CAT we use this: > > > > <domain> > > ... > > <cputune> > > ... > > <cachetune vcpus='0-3'> > > <cache id='0' level='3' type='both' size='3' unit='MiB'/> > > <cache id='1' level='3' type='both' size='3' unit='MiB'/> > > </cachetune> > > ... > > <cputune> > > ... > > </domain> > > > > so the possible domain XML could look like this: > > > > <domain> > > ... > > <cputune> > > ... > > <memory vcpus='0-3'> > > <socket id='0' bandwidth='30'/> > > <socket id='1' bandwidth='20'/> > > </memory> > > ... > > <cputune> > > ... > > </domain> > > > > Again, the element names 'memory' and 'socket' can be named differently. > socket --> node? Same here, I'm OK with 'node'. > Since the existing virrestrl implementation only care about cache part > during development, So we may need change some names of structure and > functions when enable MBA. How do you think Yes, the existing virresctrl implementation was mainly focused to cache allocation. There is a new patch series with some cleanups for the existing implementation which renames few things but we will probably need to rename some other functions/structures/etc. One important note, the rename should be done in separate patch without any functional changes. Pavel -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
© 2016 - 2025 Red Hat, Inc.