> -----Original Message----- > From: Peter Xu <peterx@redhat.com> > Sent: Thursday, March 28, 2024 11:22 PM > To: Liu, Yuan1 <yuan1.liu@intel.com> > Cc: farosas@suse.de; qemu-devel@nongnu.org; hao.xiang@bytedance.com; > bryan.zhang@bytedance.com; Zou, Nanhai <nanhai.zou@intel.com> > Subject: Re: [PATCH v5 0/7] Live Migration With IAA > > On Thu, Mar 28, 2024 at 03:02:30AM +0000, Liu, Yuan1 wrote: > > Yes, I will support software fallback to ensure CI testing and users can > > still use qpl compression without IAA hardware. > > > > Although the qpl software solution will have better performance than > zlib, > > I still don't think it has a greater advantage than zstd. I don't think > there > > is a need to add a migration option to configure the qpl software or > hardware path. > > So I will still only use QPL as an independent compression in the next > version, and > > no other migration options are needed. > > That should be fine. > > > > > I will also add a guide to qpl-compression.rst about IAA permission > issues and how to > > determine whether the hardware path is available. > > OK. > > [...] > > > > > Yes, I use iperf3 to check the bandwidth for one core, the bandwith > is > > > 60Gbps. > > > > [ ID] Interval Transfer Bitrate Retr Cwnd > > > > [ 5] 0.00-1.00 sec 7.00 GBytes 60.1 Gbits/sec 0 2.87 > MBytes > > > > [ 5] 1.00-2.00 sec 7.05 GBytes 60.6 Gbits/sec 0 2.87 > Mbytes > > > > > > > > And in the live migration test, a multifd thread's CPU utilization > is > > > almost 100% > > > > > > This 60Gpbs per-channel is definitely impressive.. > > > > > > Have you tried migration without multifd on your system? Would that > also > > > perform similarly v.s. 2 channels multifd? > > > > Simple Test result below: > > VM Type: 16vCPU, 64G memory > > Workload in VM: fill 56G memory with Silesia data and vCPUs are idle > > Migration Configurations: > > 1. migrate_set_parameter max-bandwidth 100G > > 2. migrate_set_parameter downtime-limit 300 > > 3. migrate_set_capability multifd on (multiFD test case) > > 4. migrate_set_parameter multifd-channels 2 (multiFD test case) > > > > Totaltime (ms) Downtime (ms) Throughput (mbps) Pages- > per-second > > without Multifd 23580 307 21221 689588 > > Multifd 2 7657 198 65410 2221176 > > Thanks for the test results. > > So I am guessing the migration overheads besides pushing the socket is > high > enough to make it drop drastically, even if in this case zero detection > shouldn't play a major role considering most of guest mem is pre-filled. Yes, for no multifd migration, besides the network stack overhead, the zero page detection overhead (both of source and destination) is indeed very high. Placing the zero page detection in multi-threads can reduce the performance degradation caused by the overhead of zero page detection. I also think migration doesn't need to detect zero page by memcmp in all cases. The benefit of zero page detection may be that the VM's memory determines that there are a large number of 0 pages. My experience in this area may be insufficient, I am trying with Hao and Bryan to see if it is possible to use DSA hardware to accelerate this part (including page 0 detection and writing page 0). DSA is an accelerator for detecting memory, writing memory, and comparing memory https://cdrdv2-public.intel.com/671116/341204-intel-data-streaming-accelerator-spec.pdf
© 2016 - 2024 Red Hat, Inc.