docs/rdma.txt | 420 --- include/io/channel-rdma.h | 165 ++ io/channel-rdma.c | 798 ++++++ io/meson.build | 1 + io/trace-events | 14 + meson.build | 6 - migration/meson.build | 3 +- migration/migration-stats.c | 5 +- migration/migration-stats.h | 4 - migration/migration.c | 13 +- migration/migration.h | 9 - migration/multifd.c | 10 + migration/options.c | 16 - migration/options.h | 2 - migration/qemu-file.c | 1 - migration/ram.c | 90 +- migration/rdma.c | 4205 +---------------------------- migration/rdma.h | 67 +- migration/savevm.c | 2 +- migration/trace-events | 68 +- qapi/migration.json | 13 +- scripts/analyze-migration.py | 3 - tests/unit/meson.build | 1 + tests/unit/test-io-channel-rdma.c | 276 ++ 24 files changed, 1360 insertions(+), 4832 deletions(-) delete mode 100644 docs/rdma.txt create mode 100644 include/io/channel-rdma.h create mode 100644 io/channel-rdma.c create mode 100644 tests/unit/test-io-channel-rdma.c
From: Jialin Wang <wangjialin23@huawei.com> Hi, This patch series attempts to refactor RDMA live migration by introducing a new QIOChannelRDMA class based on the rsocket API. The /usr/include/rdma/rsocket.h provides a higher level rsocket API that is a 1-1 match of the normal kernel 'sockets' API, which hides the detail of rdma protocol into rsocket and allows us to add support for some modern features like multifd more easily. Here is the previous discussion on refactoring RDMA live migration using the rsocket API: https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@linaro.org/ We have encountered some bugs when using rsocket and plan to submit them to the rdma-core community. In addition, the use of rsocket makes our programming more convenient, but it must be noted that this method introduces multiple memory copies, which can be imagined that there will be a certain performance degradation, hoping that friends with RDMA network cards can help verify, thank you! Jialin Wang (6): migration: remove RDMA live migration temporarily io: add QIOChannelRDMA class io/channel-rdma: support working in coroutine tests/unit: add test-io-channel-rdma.c migration: introduce new RDMA live migration migration/rdma: support multifd for RDMA migration docs/rdma.txt | 420 --- include/io/channel-rdma.h | 165 ++ io/channel-rdma.c | 798 ++++++ io/meson.build | 1 + io/trace-events | 14 + meson.build | 6 - migration/meson.build | 3 +- migration/migration-stats.c | 5 +- migration/migration-stats.h | 4 - migration/migration.c | 13 +- migration/migration.h | 9 - migration/multifd.c | 10 + migration/options.c | 16 - migration/options.h | 2 - migration/qemu-file.c | 1 - migration/ram.c | 90 +- migration/rdma.c | 4205 +---------------------------- migration/rdma.h | 67 +- migration/savevm.c | 2 +- migration/trace-events | 68 +- qapi/migration.json | 13 +- scripts/analyze-migration.py | 3 - tests/unit/meson.build | 1 + tests/unit/test-io-channel-rdma.c | 276 ++ 24 files changed, 1360 insertions(+), 4832 deletions(-) delete mode 100644 docs/rdma.txt create mode 100644 include/io/channel-rdma.h create mode 100644 io/channel-rdma.c create mode 100644 tests/unit/test-io-channel-rdma.c -- 2.43.0
Hi Gonglei, hi folks on the list, On Tue, Jun 4, 2024 at 2:14 PM Gonglei <arei.gonglei@huawei.com> wrote: > > From: Jialin Wang <wangjialin23@huawei.com> > > Hi, > > This patch series attempts to refactor RDMA live migration by > introducing a new QIOChannelRDMA class based on the rsocket API. > > The /usr/include/rdma/rsocket.h provides a higher level rsocket API > that is a 1-1 match of the normal kernel 'sockets' API, which hides the > detail of rdma protocol into rsocket and allows us to add support for > some modern features like multifd more easily. > > Here is the previous discussion on refactoring RDMA live migration using > the rsocket API: > > https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@linaro.org/ > > We have encountered some bugs when using rsocket and plan to submit them to > the rdma-core community. > > In addition, the use of rsocket makes our programming more convenient, > but it must be noted that this method introduces multiple memory copies, > which can be imagined that there will be a certain performance degradation, > hoping that friends with RDMA network cards can help verify, thank you! First thx for the effort, we are running migration tests on our IB fabric, different generation of HCA from mellanox, the migration works ok, there are a few failures, Yu will share the result later separately. The one blocker for the change is the old implementation and the new rsocket implementation; they don't talk to each other due to the effect of different wire protocol during connection establishment. eg the old RDMA migration has special control message during the migration flow, which rsocket use a different control message, so there lead to no way to migrate VM using rdma transport pre to the rsocket patchset to a new version with rsocket implementation. Probably we should keep both implementation for a while, mark the old implementation as deprecated, and promote the new implementation, and high light in doc, they are not compatible. Regards! Jinpu > > Jialin Wang (6): > migration: remove RDMA live migration temporarily > io: add QIOChannelRDMA class > io/channel-rdma: support working in coroutine > tests/unit: add test-io-channel-rdma.c > migration: introduce new RDMA live migration > migration/rdma: support multifd for RDMA migration > > docs/rdma.txt | 420 --- > include/io/channel-rdma.h | 165 ++ > io/channel-rdma.c | 798 ++++++ > io/meson.build | 1 + > io/trace-events | 14 + > meson.build | 6 - > migration/meson.build | 3 +- > migration/migration-stats.c | 5 +- > migration/migration-stats.h | 4 - > migration/migration.c | 13 +- > migration/migration.h | 9 - > migration/multifd.c | 10 + > migration/options.c | 16 - > migration/options.h | 2 - > migration/qemu-file.c | 1 - > migration/ram.c | 90 +- > migration/rdma.c | 4205 +---------------------------- > migration/rdma.h | 67 +- > migration/savevm.c | 2 +- > migration/trace-events | 68 +- > qapi/migration.json | 13 +- > scripts/analyze-migration.py | 3 - > tests/unit/meson.build | 1 + > tests/unit/test-io-channel-rdma.c | 276 ++ > 24 files changed, 1360 insertions(+), 4832 deletions(-) > delete mode 100644 docs/rdma.txt > create mode 100644 include/io/channel-rdma.h > create mode 100644 io/channel-rdma.c > create mode 100644 tests/unit/test-io-channel-rdma.c > > -- > 2.43.0 >
> -----Original Message----- > From: Jinpu Wang [mailto:jinpu.wang@ionos.com] > Sent: Friday, June 7, 2024 1:54 PM > To: Gonglei (Arei) <arei.gonglei@huawei.com> > Cc: qemu-devel@nongnu.org; peterx@redhat.com; yu.zhang@ionos.com; > mgalaxy@akamai.com; elmar.gerdes@ionos.com; zhengchuan > <zhengchuan@huawei.com>; berrange@redhat.com; armbru@redhat.com; > lizhijian@fujitsu.com; pbonzini@redhat.com; mst@redhat.com; Xiexiangyou > <xiexiangyou@huawei.com>; linux-rdma@vger.kernel.org; lixiao (H) > <lixiao91@huawei.com>; Wangjialin <wangjialin23@huawei.com> > Subject: Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API > > Hi Gonglei, hi folks on the list, > > On Tue, Jun 4, 2024 at 2:14 PM Gonglei <arei.gonglei@huawei.com> wrote: > > > > From: Jialin Wang <wangjialin23@huawei.com> > > > > Hi, > > > > This patch series attempts to refactor RDMA live migration by > > introducing a new QIOChannelRDMA class based on the rsocket API. > > > > The /usr/include/rdma/rsocket.h provides a higher level rsocket API > > that is a 1-1 match of the normal kernel 'sockets' API, which hides > > the detail of rdma protocol into rsocket and allows us to add support > > for some modern features like multifd more easily. > > > > Here is the previous discussion on refactoring RDMA live migration > > using the rsocket API: > > > > https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@linar > > o.org/ > > > > We have encountered some bugs when using rsocket and plan to submit > > them to the rdma-core community. > > > > In addition, the use of rsocket makes our programming more convenient, > > but it must be noted that this method introduces multiple memory > > copies, which can be imagined that there will be a certain performance > > degradation, hoping that friends with RDMA network cards can help verify, > thank you! > First thx for the effort, we are running migration tests on our IB fabric, different > generation of HCA from mellanox, the migration works ok, there are a few > failures, Yu will share the result later separately. > Thank you so much. > The one blocker for the change is the old implementation and the new rsocket > implementation; they don't talk to each other due to the effect of different wire > protocol during connection establishment. > eg the old RDMA migration has special control message during the migration > flow, which rsocket use a different control message, so there lead to no way to > migrate VM using rdma transport pre to the rsocket patchset to a new version > with rsocket implementation. > > Probably we should keep both implementation for a while, mark the old > implementation as deprecated, and promote the new implementation, and > high light in doc, they are not compatible. > IMO It makes sense. What's your opinion? @Peter. Regards, -Gonglei > Regards! > Jinpu > > > > > > > Jialin Wang (6): > > migration: remove RDMA live migration temporarily > > io: add QIOChannelRDMA class > > io/channel-rdma: support working in coroutine > > tests/unit: add test-io-channel-rdma.c > > migration: introduce new RDMA live migration > > migration/rdma: support multifd for RDMA migration > > > > docs/rdma.txt | 420 --- > > include/io/channel-rdma.h | 165 ++ > > io/channel-rdma.c | 798 ++++++ > > io/meson.build | 1 + > > io/trace-events | 14 + > > meson.build | 6 - > > migration/meson.build | 3 +- > > migration/migration-stats.c | 5 +- > > migration/migration-stats.h | 4 - > > migration/migration.c | 13 +- > > migration/migration.h | 9 - > > migration/multifd.c | 10 + > > migration/options.c | 16 - > > migration/options.h | 2 - > > migration/qemu-file.c | 1 - > > migration/ram.c | 90 +- > > migration/rdma.c | 4205 +---------------------------- > > migration/rdma.h | 67 +- > > migration/savevm.c | 2 +- > > migration/trace-events | 68 +- > > qapi/migration.json | 13 +- > > scripts/analyze-migration.py | 3 - > > tests/unit/meson.build | 1 + > > tests/unit/test-io-channel-rdma.c | 276 ++ > > 24 files changed, 1360 insertions(+), 4832 deletions(-) delete mode > > 100644 docs/rdma.txt create mode 100644 include/io/channel-rdma.h > > create mode 100644 io/channel-rdma.c create mode 100644 > > tests/unit/test-io-channel-rdma.c > > > > -- > > 2.43.0 > >
On Fri, Jun 07, 2024 at 08:28:29AM +0000, Gonglei (Arei) wrote: > > > > -----Original Message----- > > From: Jinpu Wang [mailto:jinpu.wang@ionos.com] > > Sent: Friday, June 7, 2024 1:54 PM > > To: Gonglei (Arei) <arei.gonglei@huawei.com> > > Cc: qemu-devel@nongnu.org; peterx@redhat.com; yu.zhang@ionos.com; > > mgalaxy@akamai.com; elmar.gerdes@ionos.com; zhengchuan > > <zhengchuan@huawei.com>; berrange@redhat.com; armbru@redhat.com; > > lizhijian@fujitsu.com; pbonzini@redhat.com; mst@redhat.com; Xiexiangyou > > <xiexiangyou@huawei.com>; linux-rdma@vger.kernel.org; lixiao (H) > > <lixiao91@huawei.com>; Wangjialin <wangjialin23@huawei.com> > > Subject: Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API > > > > Hi Gonglei, hi folks on the list, > > > > On Tue, Jun 4, 2024 at 2:14 PM Gonglei <arei.gonglei@huawei.com> wrote: > > > > > > From: Jialin Wang <wangjialin23@huawei.com> > > > > > > Hi, > > > > > > This patch series attempts to refactor RDMA live migration by > > > introducing a new QIOChannelRDMA class based on the rsocket API. > > > > > > The /usr/include/rdma/rsocket.h provides a higher level rsocket API > > > that is a 1-1 match of the normal kernel 'sockets' API, which hides > > > the detail of rdma protocol into rsocket and allows us to add support > > > for some modern features like multifd more easily. > > > > > > Here is the previous discussion on refactoring RDMA live migration > > > using the rsocket API: > > > > > > https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@linar > > > o.org/ > > > > > > We have encountered some bugs when using rsocket and plan to submit > > > them to the rdma-core community. > > > > > > In addition, the use of rsocket makes our programming more convenient, > > > but it must be noted that this method introduces multiple memory > > > copies, which can be imagined that there will be a certain performance > > > degradation, hoping that friends with RDMA network cards can help verify, > > thank you! > > First thx for the effort, we are running migration tests on our IB fabric, different > > generation of HCA from mellanox, the migration works ok, there are a few > > failures, Yu will share the result later separately. > > > > Thank you so much. > > > The one blocker for the change is the old implementation and the new rsocket > > implementation; they don't talk to each other due to the effect of different wire > > protocol during connection establishment. > > eg the old RDMA migration has special control message during the migration > > flow, which rsocket use a different control message, so there lead to no way to > > migrate VM using rdma transport pre to the rsocket patchset to a new version > > with rsocket implementation. > > > > Probably we should keep both implementation for a while, mark the old > > implementation as deprecated, and promote the new implementation, and > > high light in doc, they are not compatible. > > > > IMO It makes sense. What's your opinion? @Peter. Sounds good to me. We can use an internal property field and enable rsocket rdma migration on new machine types with rdma protocol, deprecating both old rdma and that internal field after 2 releases. So that when receiving rdma migrations it'll use old property (as old qemu will use old machine types), but when initiating rdma migration on new binary it'll switch to rsocket. It might be more important to address either the failures or perf concerns that others raised, though. Thanks, -- Peter Xu
On Tue, Jun 04, 2024 at 08:14:06PM +0800, Gonglei wrote: > From: Jialin Wang <wangjialin23@huawei.com> > > Hi, > > This patch series attempts to refactor RDMA live migration by > introducing a new QIOChannelRDMA class based on the rsocket API. > > The /usr/include/rdma/rsocket.h provides a higher level rsocket API > that is a 1-1 match of the normal kernel 'sockets' API, which hides the > detail of rdma protocol into rsocket and allows us to add support for > some modern features like multifd more easily. > > Here is the previous discussion on refactoring RDMA live migration using > the rsocket API: > > https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@linaro.org/ > > We have encountered some bugs when using rsocket and plan to submit them to > the rdma-core community. > > In addition, the use of rsocket makes our programming more convenient, > but it must be noted that this method introduces multiple memory copies, > which can be imagined that there will be a certain performance degradation, > hoping that friends with RDMA network cards can help verify, thank you! So you didn't test it with an RDMA card? You really should test with an RDMA card though, for correctness as much as performance. > Jialin Wang (6): > migration: remove RDMA live migration temporarily > io: add QIOChannelRDMA class > io/channel-rdma: support working in coroutine > tests/unit: add test-io-channel-rdma.c > migration: introduce new RDMA live migration > migration/rdma: support multifd for RDMA migration > > docs/rdma.txt | 420 --- > include/io/channel-rdma.h | 165 ++ > io/channel-rdma.c | 798 ++++++ > io/meson.build | 1 + > io/trace-events | 14 + > meson.build | 6 - > migration/meson.build | 3 +- > migration/migration-stats.c | 5 +- > migration/migration-stats.h | 4 - > migration/migration.c | 13 +- > migration/migration.h | 9 - > migration/multifd.c | 10 + > migration/options.c | 16 - > migration/options.h | 2 - > migration/qemu-file.c | 1 - > migration/ram.c | 90 +- > migration/rdma.c | 4205 +---------------------------- > migration/rdma.h | 67 +- > migration/savevm.c | 2 +- > migration/trace-events | 68 +- > qapi/migration.json | 13 +- > scripts/analyze-migration.py | 3 - > tests/unit/meson.build | 1 + > tests/unit/test-io-channel-rdma.c | 276 ++ > 24 files changed, 1360 insertions(+), 4832 deletions(-) > delete mode 100644 docs/rdma.txt > create mode 100644 include/io/channel-rdma.h > create mode 100644 io/channel-rdma.c > create mode 100644 tests/unit/test-io-channel-rdma.c > > -- > 2.43.0
> -----Original Message----- > From: Michael S. Tsirkin [mailto:mst@redhat.com] > Sent: Wednesday, June 5, 2024 3:57 PM > To: Gonglei (Arei) <arei.gonglei@huawei.com> > Cc: qemu-devel@nongnu.org; peterx@redhat.com; yu.zhang@ionos.com; > mgalaxy@akamai.com; elmar.gerdes@ionos.com; zhengchuan > <zhengchuan@huawei.com>; berrange@redhat.com; armbru@redhat.com; > lizhijian@fujitsu.com; pbonzini@redhat.com; Xiexiangyou > <xiexiangyou@huawei.com>; linux-rdma@vger.kernel.org; lixiao (H) > <lixiao91@huawei.com>; jinpu.wang@ionos.com; Wangjialin > <wangjialin23@huawei.com> > Subject: Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API > > On Tue, Jun 04, 2024 at 08:14:06PM +0800, Gonglei wrote: > > From: Jialin Wang <wangjialin23@huawei.com> > > > > Hi, > > > > This patch series attempts to refactor RDMA live migration by > > introducing a new QIOChannelRDMA class based on the rsocket API. > > > > The /usr/include/rdma/rsocket.h provides a higher level rsocket API > > that is a 1-1 match of the normal kernel 'sockets' API, which hides > > the detail of rdma protocol into rsocket and allows us to add support > > for some modern features like multifd more easily. > > > > Here is the previous discussion on refactoring RDMA live migration > > using the rsocket API: > > > > https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@linar > > o.org/ > > > > We have encountered some bugs when using rsocket and plan to submit > > them to the rdma-core community. > > > > In addition, the use of rsocket makes our programming more convenient, > > but it must be noted that this method introduces multiple memory > > copies, which can be imagined that there will be a certain performance > > degradation, hoping that friends with RDMA network cards can help verify, > thank you! > > So you didn't test it with an RDMA card? Yep, we tested it by Soft-ROCE. > You really should test with an RDMA card though, for correctness as much as > performance. > We will, we just don't have RDMA cards environment on hand at the moment. Regards, -Gonglei > > > Jialin Wang (6): > > migration: remove RDMA live migration temporarily > > io: add QIOChannelRDMA class > > io/channel-rdma: support working in coroutine > > tests/unit: add test-io-channel-rdma.c > > migration: introduce new RDMA live migration > > migration/rdma: support multifd for RDMA migration > > > > docs/rdma.txt | 420 --- > > include/io/channel-rdma.h | 165 ++ > > io/channel-rdma.c | 798 ++++++ > > io/meson.build | 1 + > > io/trace-events | 14 + > > meson.build | 6 - > > migration/meson.build | 3 +- > > migration/migration-stats.c | 5 +- > > migration/migration-stats.h | 4 - > > migration/migration.c | 13 +- > > migration/migration.h | 9 - > > migration/multifd.c | 10 + > > migration/options.c | 16 - > > migration/options.h | 2 - > > migration/qemu-file.c | 1 - > > migration/ram.c | 90 +- > > migration/rdma.c | 4205 +---------------------------- > > migration/rdma.h | 67 +- > > migration/savevm.c | 2 +- > > migration/trace-events | 68 +- > > qapi/migration.json | 13 +- > > scripts/analyze-migration.py | 3 - > > tests/unit/meson.build | 1 + > > tests/unit/test-io-channel-rdma.c | 276 ++ > > 24 files changed, 1360 insertions(+), 4832 deletions(-) delete mode > > 100644 docs/rdma.txt create mode 100644 include/io/channel-rdma.h > > create mode 100644 io/channel-rdma.c create mode 100644 > > tests/unit/test-io-channel-rdma.c > > > > -- > > 2.43.0
Hello Gonglei, Jinpu and I have tested your patchset by using our migration test cases on the physical RDMA cards. The result is: among 59 migration test cases, 10 failed. They are successful when using the original RDMA migration coed, but always fail when using the patchset. The syslog on the source server shows an error below: Jun 6 13:35:20 ps402a-43 WARN: Migration failed uuid="44449999-3333-48dc-9082-1b6950e74ee1" target=2a02:247f:401:2:2:0:a:2c error=Failed(Unable to write to rsocket: Connection reset by peer) We also tried to compare the migration speed between w/o the patchset. Without the patchset, a big VM (with 16 cores, 64 GB memory) stressed with heavy memory workload can be migrated successfully. With the patchset, only a small idle VM (1-2 cores, 2-4 GB memory) can be migrated successfully. In each failed migration, the above error is issued on the source server. Therefore, I assume that this version is not yet quite capable of handling heavy load yet. I'm also looking in the code to see if anything can be improved. We really appreciate your excellent work! Best regards, Yu Zhang @ IONOS cloud On Wed, Jun 5, 2024 at 12:00 PM Gonglei (Arei) <arei.gonglei@huawei.com> wrote: > > > > > -----Original Message----- > > From: Michael S. Tsirkin [mailto:mst@redhat.com] > > Sent: Wednesday, June 5, 2024 3:57 PM > > To: Gonglei (Arei) <arei.gonglei@huawei.com> > > Cc: qemu-devel@nongnu.org; peterx@redhat.com; yu.zhang@ionos.com; > > mgalaxy@akamai.com; elmar.gerdes@ionos.com; zhengchuan > > <zhengchuan@huawei.com>; berrange@redhat.com; armbru@redhat.com; > > lizhijian@fujitsu.com; pbonzini@redhat.com; Xiexiangyou > > <xiexiangyou@huawei.com>; linux-rdma@vger.kernel.org; lixiao (H) > > <lixiao91@huawei.com>; jinpu.wang@ionos.com; Wangjialin > > <wangjialin23@huawei.com> > > Subject: Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API > > > > On Tue, Jun 04, 2024 at 08:14:06PM +0800, Gonglei wrote: > > > From: Jialin Wang <wangjialin23@huawei.com> > > > > > > Hi, > > > > > > This patch series attempts to refactor RDMA live migration by > > > introducing a new QIOChannelRDMA class based on the rsocket API. > > > > > > The /usr/include/rdma/rsocket.h provides a higher level rsocket API > > > that is a 1-1 match of the normal kernel 'sockets' API, which hides > > > the detail of rdma protocol into rsocket and allows us to add support > > > for some modern features like multifd more easily. > > > > > > Here is the previous discussion on refactoring RDMA live migration > > > using the rsocket API: > > > > > > https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@linar > > > o.org/ > > > > > > We have encountered some bugs when using rsocket and plan to submit > > > them to the rdma-core community. > > > > > > In addition, the use of rsocket makes our programming more convenient, > > > but it must be noted that this method introduces multiple memory > > > copies, which can be imagined that there will be a certain performance > > > degradation, hoping that friends with RDMA network cards can help verify, > > thank you! > > > > So you didn't test it with an RDMA card? > > Yep, we tested it by Soft-ROCE. > > > You really should test with an RDMA card though, for correctness as much as > > performance. > > > We will, we just don't have RDMA cards environment on hand at the moment. > > Regards, > -Gonglei > > > > > > Jialin Wang (6): > > > migration: remove RDMA live migration temporarily > > > io: add QIOChannelRDMA class > > > io/channel-rdma: support working in coroutine > > > tests/unit: add test-io-channel-rdma.c > > > migration: introduce new RDMA live migration > > > migration/rdma: support multifd for RDMA migration > > > > > > docs/rdma.txt | 420 --- > > > include/io/channel-rdma.h | 165 ++ > > > io/channel-rdma.c | 798 ++++++ > > > io/meson.build | 1 + > > > io/trace-events | 14 + > > > meson.build | 6 - > > > migration/meson.build | 3 +- > > > migration/migration-stats.c | 5 +- > > > migration/migration-stats.h | 4 - > > > migration/migration.c | 13 +- > > > migration/migration.h | 9 - > > > migration/multifd.c | 10 + > > > migration/options.c | 16 - > > > migration/options.h | 2 - > > > migration/qemu-file.c | 1 - > > > migration/ram.c | 90 +- > > > migration/rdma.c | 4205 +---------------------------- > > > migration/rdma.h | 67 +- > > > migration/savevm.c | 2 +- > > > migration/trace-events | 68 +- > > > qapi/migration.json | 13 +- > > > scripts/analyze-migration.py | 3 - > > > tests/unit/meson.build | 1 + > > > tests/unit/test-io-channel-rdma.c | 276 ++ > > > 24 files changed, 1360 insertions(+), 4832 deletions(-) delete mode > > > 100644 docs/rdma.txt create mode 100644 include/io/channel-rdma.h > > > create mode 100644 io/channel-rdma.c create mode 100644 > > > tests/unit/test-io-channel-rdma.c > > > > > > -- > > > 2.43.0 >
On Wed, Jun 05, 2024 at 10:00:24AM +0000, Gonglei (Arei) wrote: > > > > -----Original Message----- > > From: Michael S. Tsirkin [mailto:mst@redhat.com] > > Sent: Wednesday, June 5, 2024 3:57 PM > > To: Gonglei (Arei) <arei.gonglei@huawei.com> > > Cc: qemu-devel@nongnu.org; peterx@redhat.com; yu.zhang@ionos.com; > > mgalaxy@akamai.com; elmar.gerdes@ionos.com; zhengchuan > > <zhengchuan@huawei.com>; berrange@redhat.com; armbru@redhat.com; > > lizhijian@fujitsu.com; pbonzini@redhat.com; Xiexiangyou > > <xiexiangyou@huawei.com>; linux-rdma@vger.kernel.org; lixiao (H) > > <lixiao91@huawei.com>; jinpu.wang@ionos.com; Wangjialin > > <wangjialin23@huawei.com> > > Subject: Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API > > > > On Tue, Jun 04, 2024 at 08:14:06PM +0800, Gonglei wrote: > > > From: Jialin Wang <wangjialin23@huawei.com> > > > > > > Hi, > > > > > > This patch series attempts to refactor RDMA live migration by > > > introducing a new QIOChannelRDMA class based on the rsocket API. > > > > > > The /usr/include/rdma/rsocket.h provides a higher level rsocket API > > > that is a 1-1 match of the normal kernel 'sockets' API, which hides > > > the detail of rdma protocol into rsocket and allows us to add support > > > for some modern features like multifd more easily. > > > > > > Here is the previous discussion on refactoring RDMA live migration > > > using the rsocket API: > > > > > > https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@linar > > > o.org/ > > > > > > We have encountered some bugs when using rsocket and plan to submit > > > them to the rdma-core community. > > > > > > In addition, the use of rsocket makes our programming more convenient, > > > but it must be noted that this method introduces multiple memory > > > copies, which can be imagined that there will be a certain performance > > > degradation, hoping that friends with RDMA network cards can help verify, > > thank you! > > > > So you didn't test it with an RDMA card? > > Yep, we tested it by Soft-ROCE. Does Soft-RoCE (RXE) support live migration? Thanks
On 06/06/2024 19:31, Leon Romanovsky wrote: > On Wed, Jun 05, 2024 at 10:00:24AM +0000, Gonglei (Arei) wrote: >> >> >>> -----Original Message----- >>> From: Michael S. Tsirkin [mailto:mst@redhat.com] >>> Sent: Wednesday, June 5, 2024 3:57 PM >>> To: Gonglei (Arei) <arei.gonglei@huawei.com> >>> Cc: qemu-devel@nongnu.org; peterx@redhat.com; yu.zhang@ionos.com; >>> mgalaxy@akamai.com; elmar.gerdes@ionos.com; zhengchuan >>> <zhengchuan@huawei.com>; berrange@redhat.com; armbru@redhat.com; >>> lizhijian@fujitsu.com; pbonzini@redhat.com; Xiexiangyou >>> <xiexiangyou@huawei.com>; linux-rdma@vger.kernel.org; lixiao (H) >>> <lixiao91@huawei.com>; jinpu.wang@ionos.com; Wangjialin >>> <wangjialin23@huawei.com> >>> Subject: Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API >>> >>> On Tue, Jun 04, 2024 at 08:14:06PM +0800, Gonglei wrote: >>>> From: Jialin Wang <wangjialin23@huawei.com> >>>> >>>> Hi, >>>> >>>> This patch series attempts to refactor RDMA live migration by >>>> introducing a new QIOChannelRDMA class based on the rsocket API. >>>> >>>> The /usr/include/rdma/rsocket.h provides a higher level rsocket API >>>> that is a 1-1 match of the normal kernel 'sockets' API, which hides >>>> the detail of rdma protocol into rsocket and allows us to add support >>>> for some modern features like multifd more easily. >>>> >>>> Here is the previous discussion on refactoring RDMA live migration >>>> using the rsocket API: >>>> >>>> https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@linar >>>> o.org/ >>>> >>>> We have encountered some bugs when using rsocket and plan to submit >>>> them to the rdma-core community. >>>> >>>> In addition, the use of rsocket makes our programming more convenient, >>>> but it must be noted that this method introduces multiple memory >>>> copies, which can be imagined that there will be a certain performance >>>> degradation, hoping that friends with RDMA network cards can help verify, >>> thank you! >>> >>> So you didn't test it with an RDMA card? >> >> Yep, we tested it by Soft-ROCE. > > Does Soft-RoCE (RXE) support live migration? Yes, it does Thanks Zhijian > > Thanks >
On Wed, Jun 05, 2024 at 10:00:24AM +0000, Gonglei (Arei) wrote: > > > > -----Original Message----- > > From: Michael S. Tsirkin [mailto:mst@redhat.com] > > Sent: Wednesday, June 5, 2024 3:57 PM > > To: Gonglei (Arei) <arei.gonglei@huawei.com> > > Cc: qemu-devel@nongnu.org; peterx@redhat.com; yu.zhang@ionos.com; > > mgalaxy@akamai.com; elmar.gerdes@ionos.com; zhengchuan > > <zhengchuan@huawei.com>; berrange@redhat.com; armbru@redhat.com; > > lizhijian@fujitsu.com; pbonzini@redhat.com; Xiexiangyou > > <xiexiangyou@huawei.com>; linux-rdma@vger.kernel.org; lixiao (H) > > <lixiao91@huawei.com>; jinpu.wang@ionos.com; Wangjialin > > <wangjialin23@huawei.com> > > Subject: Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API > > > > On Tue, Jun 04, 2024 at 08:14:06PM +0800, Gonglei wrote: > > > From: Jialin Wang <wangjialin23@huawei.com> > > > > > > Hi, > > > > > > This patch series attempts to refactor RDMA live migration by > > > introducing a new QIOChannelRDMA class based on the rsocket API. > > > > > > The /usr/include/rdma/rsocket.h provides a higher level rsocket API > > > that is a 1-1 match of the normal kernel 'sockets' API, which hides > > > the detail of rdma protocol into rsocket and allows us to add support > > > for some modern features like multifd more easily. > > > > > > Here is the previous discussion on refactoring RDMA live migration > > > using the rsocket API: > > > > > > https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@linar > > > o.org/ > > > > > > We have encountered some bugs when using rsocket and plan to submit > > > them to the rdma-core community. > > > > > > In addition, the use of rsocket makes our programming more convenient, > > > but it must be noted that this method introduces multiple memory > > > copies, which can be imagined that there will be a certain performance > > > degradation, hoping that friends with RDMA network cards can help verify, > > thank you! > > > > So you didn't test it with an RDMA card? > > Yep, we tested it by Soft-ROCE. > > > You really should test with an RDMA card though, for correctness as much as > > performance. > > > We will, we just don't have RDMA cards environment on hand at the moment. > > Regards, > -Gonglei Until it's tested on real hardware it is probably best to tag this series as RFC in the subject. > > > > > Jialin Wang (6): > > > migration: remove RDMA live migration temporarily > > > io: add QIOChannelRDMA class > > > io/channel-rdma: support working in coroutine > > > tests/unit: add test-io-channel-rdma.c > > > migration: introduce new RDMA live migration > > > migration/rdma: support multifd for RDMA migration > > > > > > docs/rdma.txt | 420 --- > > > include/io/channel-rdma.h | 165 ++ > > > io/channel-rdma.c | 798 ++++++ > > > io/meson.build | 1 + > > > io/trace-events | 14 + > > > meson.build | 6 - > > > migration/meson.build | 3 +- > > > migration/migration-stats.c | 5 +- > > > migration/migration-stats.h | 4 - > > > migration/migration.c | 13 +- > > > migration/migration.h | 9 - > > > migration/multifd.c | 10 + > > > migration/options.c | 16 - > > > migration/options.h | 2 - > > > migration/qemu-file.c | 1 - > > > migration/ram.c | 90 +- > > > migration/rdma.c | 4205 +---------------------------- > > > migration/rdma.h | 67 +- > > > migration/savevm.c | 2 +- > > > migration/trace-events | 68 +- > > > qapi/migration.json | 13 +- > > > scripts/analyze-migration.py | 3 - > > > tests/unit/meson.build | 1 + > > > tests/unit/test-io-channel-rdma.c | 276 ++ > > > 24 files changed, 1360 insertions(+), 4832 deletions(-) delete mode > > > 100644 docs/rdma.txt create mode 100644 include/io/channel-rdma.h > > > create mode 100644 io/channel-rdma.c create mode 100644 > > > tests/unit/test-io-channel-rdma.c > > > > > > -- > > > 2.43.0
Hi, Lei, Jialin, Thanks a lot for working on this! I think we'll need to wait a bit on feedbacks from Jinpu and his team on RDMA side, also Daniel for iochannels. Also, please remember to copy Fabiano Rosas in any relevant future posts. We'd also like to know whether he has any comments too. I have him copied in this reply. On Tue, Jun 04, 2024 at 08:14:06PM +0800, Gonglei wrote: > From: Jialin Wang <wangjialin23@huawei.com> > > Hi, > > This patch series attempts to refactor RDMA live migration by > introducing a new QIOChannelRDMA class based on the rsocket API. > > The /usr/include/rdma/rsocket.h provides a higher level rsocket API > that is a 1-1 match of the normal kernel 'sockets' API, which hides the > detail of rdma protocol into rsocket and allows us to add support for > some modern features like multifd more easily. > > Here is the previous discussion on refactoring RDMA live migration using > the rsocket API: > > https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@linaro.org/ > > We have encountered some bugs when using rsocket and plan to submit them to > the rdma-core community. > > In addition, the use of rsocket makes our programming more convenient, > but it must be noted that this method introduces multiple memory copies, > which can be imagined that there will be a certain performance degradation, > hoping that friends with RDMA network cards can help verify, thank you! It'll be good to elaborate if you tested it in-house. What people should expect on the numbers exactly? Is that okay from Huawei's POV? Besides that, the code looks pretty good at a first glance to me. Before others chim in, here're some high level comments.. Firstly, can we avoid using coroutine when listen()? Might be relevant when I see that rdma_accept_incoming_migration() runs in a loop to do raccept(), but would that also hang the qemu main loop even with the coroutine, before all channels are ready? I'm not a coroutine person, but I think the hope is that we can make dest QEMU run in a thread in the future just like the src QEMU, so the less coroutine the better in this path. I think I also left a comment elsewhere on whether it would be possible to allow iochannels implement their own poll() functions to avoid the per-channel poll thread that is proposed in this series. https://lore.kernel.org/r/ZldY21xVExtiMddB@x1n Personally I think even with the thread proposal it's better than the old rdma code, but I just still want to double check with you guys. E.g., maybe that just won't work at all? Again, that'll also be based on the fact that we move migration incoming into a thread first to keep the dest QEMU main loop intact, I think, but I hope we will reach that irrelevant of rdma, IOW it'll be nice to happen even earlier if possible. Another nitpick is that qio_channel_rdma_listen_async() doesn't look used and may prone to removal. Thanks, -- Peter Xu
On Tue, Jun 04, 2024 at 03:32:19PM -0400, Peter Xu wrote: > Hi, Lei, Jialin, > > Thanks a lot for working on this! > > I think we'll need to wait a bit on feedbacks from Jinpu and his team on > RDMA side, also Daniel for iochannels. Also, please remember to copy > Fabiano Rosas in any relevant future posts. We'd also like to know whether > he has any comments too. I have him copied in this reply. I've not formally reviewed it, but I had a quick glance through the I/O channel patches and they all look sensible. Pretty much exactly what I was hoping it would end up looking like. > > In addition, the use of rsocket makes our programming more convenient, > > but it must be noted that this method introduces multiple memory copies, > > which can be imagined that there will be a certain performance degradation, > > hoping that friends with RDMA network cards can help verify, thank you! > > It'll be good to elaborate if you tested it in-house. What people should > expect on the numbers exactly? Is that okay from Huawei's POV? > > Besides that, the code looks pretty good at a first glance to me. Before snip > Personally I think even with the thread proposal it's better than the old > rdma code, but I just still want to double check with you guys. E.g., > maybe that just won't work at all? Again, that'll also be based on the > fact that we move migration incoming into a thread first to keep the dest > QEMU main loop intact, I think, but I hope we will reach that irrelevant of > rdma, IOW it'll be nice to happen even earlier if possible. Yes, from the migration code POV, this is a massive step forward - the RDMA integration is no completely trivial for migration code. The $million question is what the performance of this new implmentation looks like on real hardware. As mentioned above the extra memory copies will probably hurt performance compared to the old version. We need the performance of the new RDMA impl to still be better than the plain TCP sockets backend to make it worthwhile having RDMA. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Hi Peter, > -----Original Message----- > From: Peter Xu [mailto:peterx@redhat.com] > Sent: Wednesday, June 5, 2024 3:32 AM > To: Gonglei (Arei) <arei.gonglei@huawei.com> > Cc: qemu-devel@nongnu.org; yu.zhang@ionos.com; mgalaxy@akamai.com; > elmar.gerdes@ionos.com; zhengchuan <zhengchuan@huawei.com>; > berrange@redhat.com; armbru@redhat.com; lizhijian@fujitsu.com; > pbonzini@redhat.com; mst@redhat.com; Xiexiangyou > <xiexiangyou@huawei.com>; linux-rdma@vger.kernel.org; lixiao (H) > <lixiao91@huawei.com>; jinpu.wang@ionos.com; Wangjialin > <wangjialin23@huawei.com>; Fabiano Rosas <farosas@suse.de> > Subject: Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API > > Hi, Lei, Jialin, > > Thanks a lot for working on this! > > I think we'll need to wait a bit on feedbacks from Jinpu and his team on RDMA > side, also Daniel for iochannels. Also, please remember to copy Fabiano > Rosas in any relevant future posts. We'd also like to know whether he has any > comments too. I have him copied in this reply. > > On Tue, Jun 04, 2024 at 08:14:06PM +0800, Gonglei wrote: > > From: Jialin Wang <wangjialin23@huawei.com> > > > > Hi, > > > > This patch series attempts to refactor RDMA live migration by > > introducing a new QIOChannelRDMA class based on the rsocket API. > > > > The /usr/include/rdma/rsocket.h provides a higher level rsocket API > > that is a 1-1 match of the normal kernel 'sockets' API, which hides > > the detail of rdma protocol into rsocket and allows us to add support > > for some modern features like multifd more easily. > > > > Here is the previous discussion on refactoring RDMA live migration > > using the rsocket API: > > > > https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@linar > > o.org/ > > > > We have encountered some bugs when using rsocket and plan to submit > > them to the rdma-core community. > > > > In addition, the use of rsocket makes our programming more convenient, > > but it must be noted that this method introduces multiple memory > > copies, which can be imagined that there will be a certain performance > > degradation, hoping that friends with RDMA network cards can help verify, > thank you! > > It'll be good to elaborate if you tested it in-house. What people should expect > on the numbers exactly? Is that okay from Huawei's POV? > > Besides that, the code looks pretty good at a first glance to me. Before > others chim in, here're some high level comments.. > > Firstly, can we avoid using coroutine when listen()? Might be relevant when I > see that rdma_accept_incoming_migration() runs in a loop to do raccept(), but > would that also hang the qemu main loop even with the coroutine, before all > channels are ready? I'm not a coroutine person, but I think the hope is that > we can make dest QEMU run in a thread in the future just like the src QEMU, so > the less coroutine the better in this path. > Because rsocket is set to non-blocking, raccept will return EAGAIN when no connection is received, coroutine will yield, and will not hang the qemu main loop. > I think I also left a comment elsewhere on whether it would be possible to allow > iochannels implement their own poll() functions to avoid the per-channel poll > thread that is proposed in this series. > > https://lore.kernel.org/r/ZldY21xVExtiMddB@x1n > We noticed that, and it's a big operation. I'm not sure that's a better way. > Personally I think even with the thread proposal it's better than the old rdma > code, but I just still want to double check with you guys. E.g., maybe that just > won't work at all? Again, that'll also be based on the fact that we move > migration incoming into a thread first to keep the dest QEMU main loop intact, > I think, but I hope we will reach that irrelevant of rdma, IOW it'll be nice to > happen even earlier if possible. > Yep. This is a fairly big change, I wonder what other people's suggestions are? > Another nitpick is that qio_channel_rdma_listen_async() doesn't look used and > may prone to removal. > Yes. This is because when we wrote the test case, we wanted to test qio_channel_rdma_connect_async, and also I added qio_channel_rdma_listen_async. It is not used in the RDMA hot migration code. Regards, -Gonglei
On Wed, Jun 05, 2024 at 10:09:43AM +0000, Gonglei (Arei) wrote: > Hi Peter, > > > -----Original Message----- > > From: Peter Xu [mailto:peterx@redhat.com] > > Sent: Wednesday, June 5, 2024 3:32 AM > > To: Gonglei (Arei) <arei.gonglei@huawei.com> > > Cc: qemu-devel@nongnu.org; yu.zhang@ionos.com; mgalaxy@akamai.com; > > elmar.gerdes@ionos.com; zhengchuan <zhengchuan@huawei.com>; > > berrange@redhat.com; armbru@redhat.com; lizhijian@fujitsu.com; > > pbonzini@redhat.com; mst@redhat.com; Xiexiangyou > > <xiexiangyou@huawei.com>; linux-rdma@vger.kernel.org; lixiao (H) > > <lixiao91@huawei.com>; jinpu.wang@ionos.com; Wangjialin > > <wangjialin23@huawei.com>; Fabiano Rosas <farosas@suse.de> > > Subject: Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API > > > > Hi, Lei, Jialin, > > > > Thanks a lot for working on this! > > > > I think we'll need to wait a bit on feedbacks from Jinpu and his team on RDMA > > side, also Daniel for iochannels. Also, please remember to copy Fabiano > > Rosas in any relevant future posts. We'd also like to know whether he has any > > comments too. I have him copied in this reply. > > > > On Tue, Jun 04, 2024 at 08:14:06PM +0800, Gonglei wrote: > > > From: Jialin Wang <wangjialin23@huawei.com> > > > > > > Hi, > > > > > > This patch series attempts to refactor RDMA live migration by > > > introducing a new QIOChannelRDMA class based on the rsocket API. > > > > > > The /usr/include/rdma/rsocket.h provides a higher level rsocket API > > > that is a 1-1 match of the normal kernel 'sockets' API, which hides > > > the detail of rdma protocol into rsocket and allows us to add support > > > for some modern features like multifd more easily. > > > > > > Here is the previous discussion on refactoring RDMA live migration > > > using the rsocket API: > > > > > > https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@linar > > > o.org/ > > > > > > We have encountered some bugs when using rsocket and plan to submit > > > them to the rdma-core community. > > > > > > In addition, the use of rsocket makes our programming more convenient, > > > but it must be noted that this method introduces multiple memory > > > copies, which can be imagined that there will be a certain performance > > > degradation, hoping that friends with RDMA network cards can help verify, > > thank you! > > > > It'll be good to elaborate if you tested it in-house. What people should expect > > on the numbers exactly? Is that okay from Huawei's POV? > > > > Besides that, the code looks pretty good at a first glance to me. Before > > others chim in, here're some high level comments.. > > > > Firstly, can we avoid using coroutine when listen()? Might be relevant when I > > see that rdma_accept_incoming_migration() runs in a loop to do raccept(), but > > would that also hang the qemu main loop even with the coroutine, before all > > channels are ready? I'm not a coroutine person, but I think the hope is that > > we can make dest QEMU run in a thread in the future just like the src QEMU, so > > the less coroutine the better in this path. > > > > Because rsocket is set to non-blocking, raccept will return EAGAIN when no connection > is received, coroutine will yield, and will not hang the qemu main loop. Ah that's ok. And also I just noticed it may not be a big deal either as long as we're before migration_incoming_process(). I'm wondering whether it can do it similarly like what we do with sockets in qio_net_listener_set_client_func_full(). After all, rsocket wants to mimic the socket API. It'll make sense if rsocket code tries to match with socket, or even reuse. > > > I think I also left a comment elsewhere on whether it would be possible to allow > > iochannels implement their own poll() functions to avoid the per-channel poll > > thread that is proposed in this series. > > > > https://lore.kernel.org/r/ZldY21xVExtiMddB@x1n > > > > We noticed that, and it's a big operation. I'm not sure that's a better way. > > > Personally I think even with the thread proposal it's better than the old rdma > > code, but I just still want to double check with you guys. E.g., maybe that just > > won't work at all? Again, that'll also be based on the fact that we move > > migration incoming into a thread first to keep the dest QEMU main loop intact, > > I think, but I hope we will reach that irrelevant of rdma, IOW it'll be nice to > > happen even earlier if possible. > > > Yep. This is a fairly big change, I wonder what other people's suggestions are? Yes we can wait for others' opinions. And btw I'm not asking for it and I don't think it'll be a blocker for this approach to land, as I said this is better than the current code so it's definitely an improvement to me. I'm purely curious, because if you're not going to do it for rdma, maybe someday I'll try to do that, and I want to know what "big change" could be as I didn't dig further. It may help me by sharing what issues you've found. Thanks, -- Peter Xu
> -----Original Message----- > From: Peter Xu [mailto:peterx@redhat.com] > Sent: Wednesday, June 5, 2024 10:19 PM > To: Gonglei (Arei) <arei.gonglei@huawei.com> > Cc: qemu-devel@nongnu.org; yu.zhang@ionos.com; mgalaxy@akamai.com; > elmar.gerdes@ionos.com; zhengchuan <zhengchuan@huawei.com>; > berrange@redhat.com; armbru@redhat.com; lizhijian@fujitsu.com; > pbonzini@redhat.com; mst@redhat.com; Xiexiangyou > <xiexiangyou@huawei.com>; linux-rdma@vger.kernel.org; lixiao (H) > <lixiao91@huawei.com>; jinpu.wang@ionos.com; Wangjialin > <wangjialin23@huawei.com>; Fabiano Rosas <farosas@suse.de> > Subject: Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API > > On Wed, Jun 05, 2024 at 10:09:43AM +0000, Gonglei (Arei) wrote: > > Hi Peter, > > > > > -----Original Message----- > > > From: Peter Xu [mailto:peterx@redhat.com] > > > Sent: Wednesday, June 5, 2024 3:32 AM > > > To: Gonglei (Arei) <arei.gonglei@huawei.com> > > > Cc: qemu-devel@nongnu.org; yu.zhang@ionos.com; > mgalaxy@akamai.com; > > > elmar.gerdes@ionos.com; zhengchuan <zhengchuan@huawei.com>; > > > berrange@redhat.com; armbru@redhat.com; lizhijian@fujitsu.com; > > > pbonzini@redhat.com; mst@redhat.com; Xiexiangyou > > > <xiexiangyou@huawei.com>; linux-rdma@vger.kernel.org; lixiao (H) > > > <lixiao91@huawei.com>; jinpu.wang@ionos.com; Wangjialin > > > <wangjialin23@huawei.com>; Fabiano Rosas <farosas@suse.de> > > > Subject: Re: [PATCH 0/6] refactor RDMA live migration based on > > > rsocket API > > > > > > Hi, Lei, Jialin, > > > > > > Thanks a lot for working on this! > > > > > > I think we'll need to wait a bit on feedbacks from Jinpu and his > > > team on RDMA side, also Daniel for iochannels. Also, please > > > remember to copy Fabiano Rosas in any relevant future posts. We'd > > > also like to know whether he has any comments too. I have him copied in > this reply. > > > > > > On Tue, Jun 04, 2024 at 08:14:06PM +0800, Gonglei wrote: > > > > From: Jialin Wang <wangjialin23@huawei.com> > > > > > > > > Hi, > > > > > > > > This patch series attempts to refactor RDMA live migration by > > > > introducing a new QIOChannelRDMA class based on the rsocket API. > > > > > > > > The /usr/include/rdma/rsocket.h provides a higher level rsocket > > > > API that is a 1-1 match of the normal kernel 'sockets' API, which > > > > hides the detail of rdma protocol into rsocket and allows us to > > > > add support for some modern features like multifd more easily. > > > > > > > > Here is the previous discussion on refactoring RDMA live migration > > > > using the rsocket API: > > > > > > > > https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@l > > > > inar > > > > o.org/ > > > > > > > > We have encountered some bugs when using rsocket and plan to > > > > submit them to the rdma-core community. > > > > > > > > In addition, the use of rsocket makes our programming more > > > > convenient, but it must be noted that this method introduces > > > > multiple memory copies, which can be imagined that there will be a > > > > certain performance degradation, hoping that friends with RDMA > > > > network cards can help verify, > > > thank you! > > > > > > It'll be good to elaborate if you tested it in-house. What people > > > should expect on the numbers exactly? Is that okay from Huawei's POV? > > > > > > Besides that, the code looks pretty good at a first glance to me. > > > Before others chim in, here're some high level comments.. > > > > > > Firstly, can we avoid using coroutine when listen()? Might be > > > relevant when I see that rdma_accept_incoming_migration() runs in a > > > loop to do raccept(), but would that also hang the qemu main loop > > > even with the coroutine, before all channels are ready? I'm not a > > > coroutine person, but I think the hope is that we can make dest QEMU > > > run in a thread in the future just like the src QEMU, so the less coroutine > the better in this path. > > > > > > > Because rsocket is set to non-blocking, raccept will return EAGAIN > > when no connection is received, coroutine will yield, and will not hang the > qemu main loop. > > Ah that's ok. And also I just noticed it may not be a big deal either as long as > we're before migration_incoming_process(). > > I'm wondering whether it can do it similarly like what we do with sockets in > qio_net_listener_set_client_func_full(). After all, rsocket wants to mimic the > socket API. It'll make sense if rsocket code tries to match with socket, or > even reuse. > Actually we tried this solution, but it didn't work. Pls see patch 3/6 Known limitations: For a blocking rsocket fd, if we use io_create_watch to wait for POLLIN or POLLOUT events, since the rsocket fd is blocking, we cannot determine when it is not ready to read/write as we can with non-blocking fds. Therefore, when an event occurs, it will occurs always, potentially leave the qemu hanging. So we need be cautious to avoid hanging when using io_create_watch . Regards, -Gonglei > > > > > I think I also left a comment elsewhere on whether it would be > > > possible to allow iochannels implement their own poll() functions to > > > avoid the per-channel poll thread that is proposed in this series. > > > > > > https://lore.kernel.org/r/ZldY21xVExtiMddB@x1n > > > > > > > We noticed that, and it's a big operation. I'm not sure that's a better way. > > > > > Personally I think even with the thread proposal it's better than > > > the old rdma code, but I just still want to double check with you > > > guys. E.g., maybe that just won't work at all? Again, that'll also > > > be based on the fact that we move migration incoming into a thread > > > first to keep the dest QEMU main loop intact, I think, but I hope we > > > will reach that irrelevant of rdma, IOW it'll be nice to happen even earlier if > possible. > > > > > Yep. This is a fairly big change, I wonder what other people's suggestions > are? > > Yes we can wait for others' opinions. And btw I'm not asking for it and I don't > think it'll be a blocker for this approach to land, as I said this is better than the > current code so it's definitely an improvement to me. > > I'm purely curious, because if you're not going to do it for rdma, maybe > someday I'll try to do that, and I want to know what "big change" could be as I > didn't dig further. It may help me by sharing what issues you've found. > > Thanks, > > -- > Peter Xu
On Fri, Jun 07, 2024 at 08:49:01AM +0000, Gonglei (Arei) wrote: > Actually we tried this solution, but it didn't work. Pls see patch 3/6 > > Known limitations: > For a blocking rsocket fd, if we use io_create_watch to wait for > POLLIN or POLLOUT events, since the rsocket fd is blocking, we > cannot determine when it is not ready to read/write as we can with > non-blocking fds. Therefore, when an event occurs, it will occurs > always, potentially leave the qemu hanging. So we need be cautious > to avoid hanging when using io_create_watch . I'm not sure I fully get that part, though. In: https://lore.kernel.org/all/ZldY21xVExtiMddB@x1n/ I was thinking of iochannel implements its own poll with the _POLL flag, so in that case it'll call qio_channel_poll() which should call rpoll() directly. So I didn't expect using qio_channel_create_watch(). I thought the context was gmainloop won't work with rsocket fds in general, but maybe I missed something. Thanks, -- Peter Xu
© 2016 - 2026 Red Hat, Inc.