vhost-user-fs: add capability to allow migration

[PATCH] vhost-user-fs: add capability to allow migration

Posted by Anton Kuchin 3 years ago

Now any vhost-user-fs device makes VM unmigratable, that also prevents
qemu update without stopping the VM. In most cases that makes sense
because qemu has no way to transfer FUSE session state.

But we can give an option to orchestrator to override this if it can
guarantee that state will be preserved (e.g. it uses migration to
update qemu and dst will run on the same host as src and use the same
socket endpoints).

This patch keeps default behavior that prevents migration with such devices
but adds migration capability 'vhost-user-fs' to explicitly allow migration.

Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
---
 hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
 qapi/migration.json       |  7 ++++++-
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
index f5049735ac..13d920423e 100644
--- a/hw/virtio/vhost-user-fs.c
+++ b/hw/virtio/vhost-user-fs.c
@@ -24,6 +24,7 @@
 #include "hw/virtio/vhost-user-fs.h"
 #include "monitor/monitor.h"
 #include "sysemu/sysemu.h"
+#include "migration/migration.h"
 
 static const int user_feature_bits[] = {
     VIRTIO_F_VERSION_1,
@@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
     return &fs->vhost_dev;
 }
 
+static int vhost_user_fs_pre_save(void *opaque)
+{
+    MigrationState *s = migrate_get_current();
+
+    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
+        error_report("Migration of vhost-user-fs devices requires internal FUSE "
+                     "state of backend to be preserved. If orchestrator can "
+                     "guarantee this (e.g. dst connects to the same backend "
+                     "instance or backend state is migrated) set 'vhost-user-fs' "
+                     "migration capability to true to enable migration.");
+        return -1;
+    }
+
+    return 0;
+}
+
 static const VMStateDescription vuf_vmstate = {
     .name = "vhost-user-fs",
-    .unmigratable = 1,
+    .minimum_version_id = 0,
+    .version_id = 0,
+    .fields = (VMStateField[]) {
+        VMSTATE_VIRTIO_DEVICE,
+        VMSTATE_END_OF_LIST()
+    },
+   .pre_save = vhost_user_fs_pre_save,
 };
 
 static Property vuf_properties[] = {
diff --git a/qapi/migration.json b/qapi/migration.json
index 88ecf86ac8..9a229ea884 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -477,6 +477,11 @@
 #                    will be handled faster.  This is a performance feature and
 #                    should not affect the correctness of postcopy migration.
 #                    (since 7.1)
+# @vhost-user-fs: If enabled, the migration process will allow migration of
+#                 vhost-user-fs devices, this should be enabled only when
+#                 backend can preserve local FUSE state e.g. for qemu update
+#                 when dst reconects to the same endpoints after migration.
+#                 (since 8.0)
 #
 # Features:
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
@@ -492,7 +497,7 @@
            'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
            { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
            'validate-uuid', 'background-snapshot',
-           'zero-copy-send', 'postcopy-preempt'] }
+           'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] }
 
 ##
 # @MigrationCapabilityStatus:
-- 
2.34.1

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Stefan Hajnoczi 3 years ago

On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:
> Now any vhost-user-fs device makes VM unmigratable, that also prevents
> qemu update without stopping the VM. In most cases that makes sense
> because qemu has no way to transfer FUSE session state.
> 
> But we can give an option to orchestrator to override this if it can
> guarantee that state will be preserved (e.g. it uses migration to
> update qemu and dst will run on the same host as src and use the same
> socket endpoints).
> 
> This patch keeps default behavior that prevents migration with such devices
> but adds migration capability 'vhost-user-fs' to explicitly allow migration.
> 
> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
> ---
>  hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
>  qapi/migration.json       |  7 ++++++-
>  2 files changed, 30 insertions(+), 2 deletions(-)

Hi Anton,
Sorry for holding up your work with the discussions that we had. I still
feel it's important to agree on command-line and/or vhost-user protocol
changes that will be able to support non-migratable, stateless
migration/reconnect, and stateful migration vhost-user-fs back-ends. All
three will exist.

As a next step, could you share your code that implements the QEMU side
of stateless migration?

I think that will make it clearer whether a command-line option
(migration capability or per-device) is sufficient or whether the
vhost-user protocol needs to be extended.

If the vhost-user protocol is extended then maybe no user-visible
changes are necessary. QEMU will know if the vhost-user-fs backend
supports migration and which type of migration. It can block migration
in cases where it's not possible.

Thanks,
Stefan

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Anton Kuchin 3 years ago

On 25/01/2023 21:46, Stefan Hajnoczi wrote:
> On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:
>> Now any vhost-user-fs device makes VM unmigratable, that also prevents
>> qemu update without stopping the VM. In most cases that makes sense
>> because qemu has no way to transfer FUSE session state.
>>
>> But we can give an option to orchestrator to override this if it can
>> guarantee that state will be preserved (e.g. it uses migration to
>> update qemu and dst will run on the same host as src and use the same
>> socket endpoints).
>>
>> This patch keeps default behavior that prevents migration with such devices
>> but adds migration capability 'vhost-user-fs' to explicitly allow migration.
>>
>> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
>> ---
>>   hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
>>   qapi/migration.json       |  7 ++++++-
>>   2 files changed, 30 insertions(+), 2 deletions(-)
> Hi Anton,
> Sorry for holding up your work with the discussions that we had. I still
> feel it's important to agree on command-line and/or vhost-user protocol
> changes that will be able to support non-migratable, stateless
> migration/reconnect, and stateful migration vhost-user-fs back-ends. All
> three will exist.
>
> As a next step, could you share your code that implements the QEMU side
> of stateless migration?
>
> I think that will make it clearer whether a command-line option
> (migration capability or per-device) is sufficient or whether the
> vhost-user protocol needs to be extended.
>
> If the vhost-user protocol is extended then maybe no user-visible
> changes are necessary. QEMU will know if the vhost-user-fs backend
> supports migration and which type of migration. It can block migration
> in cases where it's not possible.
>
> Thanks,
> Stefan


Thank you, Stefan,

That's OK. The discussion is very helpful and showed me some parts
that should to be checked to make sure no harm is done by this feature.
I needed some time to step back, review my approach to this feature
with all valuable feedback and ideas that were suggested and check
what other backend implementations can or can't do.
I'll answer today the emails with questions that were addressed to me.

This is all the code that QEMU needs to support stateless migration.
Do you mean backend and/or orchestrator parts?

I'll think of how protocol extension can help us make this feature
transparent to users.

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Stefan Hajnoczi 3 years ago

On Thu, 26 Jan 2023 at 09:20, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>
> On 25/01/2023 21:46, Stefan Hajnoczi wrote:
> > On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:
> >> Now any vhost-user-fs device makes VM unmigratable, that also prevents
> >> qemu update without stopping the VM. In most cases that makes sense
> >> because qemu has no way to transfer FUSE session state.
> >>
> >> But we can give an option to orchestrator to override this if it can
> >> guarantee that state will be preserved (e.g. it uses migration to
> >> update qemu and dst will run on the same host as src and use the same
> >> socket endpoints).
> >>
> >> This patch keeps default behavior that prevents migration with such devices
> >> but adds migration capability 'vhost-user-fs' to explicitly allow migration.
> >>
> >> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
> >> ---
> >>   hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
> >>   qapi/migration.json       |  7 ++++++-
> >>   2 files changed, 30 insertions(+), 2 deletions(-)
> > Hi Anton,
> > Sorry for holding up your work with the discussions that we had. I still
> > feel it's important to agree on command-line and/or vhost-user protocol
> > changes that will be able to support non-migratable, stateless
> > migration/reconnect, and stateful migration vhost-user-fs back-ends. All
> > three will exist.
> >
> > As a next step, could you share your code that implements the QEMU side
> > of stateless migration?
> >
> > I think that will make it clearer whether a command-line option
> > (migration capability or per-device) is sufficient or whether the
> > vhost-user protocol needs to be extended.
> >
> > If the vhost-user protocol is extended then maybe no user-visible
> > changes are necessary. QEMU will know if the vhost-user-fs backend
> > supports migration and which type of migration. It can block migration
> > in cases where it's not possible.
> >
> > Thanks,
> > Stefan
>
>
> Thank you, Stefan,
>
> That's OK. The discussion is very helpful and showed me some parts
> that should to be checked to make sure no harm is done by this feature.
> I needed some time to step back, review my approach to this feature
> with all valuable feedback and ideas that were suggested and check
> what other backend implementations can or can't do.
> I'll answer today the emails with questions that were addressed to me.
>
> This is all the code that QEMU needs to support stateless migration.
> Do you mean backend and/or orchestrator parts?

It's unclear to me how the destination QEMU is able to connect to the
vhost-user back-end while the source QEMU is still connected? I
thought additional QEMU changes would be necessary to make migration
handover work.

Stefan

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Anton Kuchin 3 years ago

On 26/01/2023 17:13, Stefan Hajnoczi wrote:
> On Thu, 26 Jan 2023 at 09:20, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>> On 25/01/2023 21:46, Stefan Hajnoczi wrote:
>>> On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:
>>>> Now any vhost-user-fs device makes VM unmigratable, that also prevents
>>>> qemu update without stopping the VM. In most cases that makes sense
>>>> because qemu has no way to transfer FUSE session state.
>>>>
>>>> But we can give an option to orchestrator to override this if it can
>>>> guarantee that state will be preserved (e.g. it uses migration to
>>>> update qemu and dst will run on the same host as src and use the same
>>>> socket endpoints).
>>>>
>>>> This patch keeps default behavior that prevents migration with such devices
>>>> but adds migration capability 'vhost-user-fs' to explicitly allow migration.
>>>>
>>>> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
>>>> ---
>>>>    hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
>>>>    qapi/migration.json       |  7 ++++++-
>>>>    2 files changed, 30 insertions(+), 2 deletions(-)
>>> Hi Anton,
>>> Sorry for holding up your work with the discussions that we had. I still
>>> feel it's important to agree on command-line and/or vhost-user protocol
>>> changes that will be able to support non-migratable, stateless
>>> migration/reconnect, and stateful migration vhost-user-fs back-ends. All
>>> three will exist.
>>>
>>> As a next step, could you share your code that implements the QEMU side
>>> of stateless migration?
>>>
>>> I think that will make it clearer whether a command-line option
>>> (migration capability or per-device) is sufficient or whether the
>>> vhost-user protocol needs to be extended.
>>>
>>> If the vhost-user protocol is extended then maybe no user-visible
>>> changes are necessary. QEMU will know if the vhost-user-fs backend
>>> supports migration and which type of migration. It can block migration
>>> in cases where it's not possible.
>>>
>>> Thanks,
>>> Stefan
>>
>> Thank you, Stefan,
>>
>> That's OK. The discussion is very helpful and showed me some parts
>> that should to be checked to make sure no harm is done by this feature.
>> I needed some time to step back, review my approach to this feature
>> with all valuable feedback and ideas that were suggested and check
>> what other backend implementations can or can't do.
>> I'll answer today the emails with questions that were addressed to me.
>>
>> This is all the code that QEMU needs to support stateless migration.
>> Do you mean backend and/or orchestrator parts?
> It's unclear to me how the destination QEMU is able to connect to the
> vhost-user back-end while the source QEMU is still connected? I
> thought additional QEMU changes would be necessary to make migration
> handover work.
>
> Stefan

Oh, yes, that is exactly the part I'm checking right now: I was
testing the scenario when vm is saved to file and then restored from
file without host and destination running at the same time.

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Michael S. Tsirkin 3 years ago

On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:
> Now any vhost-user-fs device makes VM unmigratable, that also prevents
> qemu update without stopping the VM. In most cases that makes sense
> because qemu has no way to transfer FUSE session state.
> 
> But we can give an option to orchestrator to override this if it can
> guarantee that state will be preserved (e.g. it uses migration to
> update qemu and dst will run on the same host as src and use the same
> socket endpoints).
> 
> This patch keeps default behavior that prevents migration with such devices
> but adds migration capability 'vhost-user-fs' to explicitly allow migration.
> 
> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
> ---
>  hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
>  qapi/migration.json       |  7 ++++++-
>  2 files changed, 30 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index f5049735ac..13d920423e 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -24,6 +24,7 @@
>  #include "hw/virtio/vhost-user-fs.h"
>  #include "monitor/monitor.h"
>  #include "sysemu/sysemu.h"
> +#include "migration/migration.h"
>  
>  static const int user_feature_bits[] = {
>      VIRTIO_F_VERSION_1,
> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
>      return &fs->vhost_dev;
>  }
>  
> +static int vhost_user_fs_pre_save(void *opaque)
> +{
> +    MigrationState *s = migrate_get_current();
> +
> +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
> +        error_report("Migration of vhost-user-fs devices requires internal FUSE "
> +                     "state of backend to be preserved. If orchestrator can "
> +                     "guarantee this (e.g. dst connects to the same backend "
> +                     "instance or backend state is migrated) set 'vhost-user-fs' "
> +                     "migration capability to true to enable migration.");

Isn't it possible that some backends are same and some are not?
Shouldn't this be a device property then?



> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
>  static const VMStateDescription vuf_vmstate = {
>      .name = "vhost-user-fs",
> -    .unmigratable = 1,
> +    .minimum_version_id = 0,
> +    .version_id = 0,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_VIRTIO_DEVICE,
> +        VMSTATE_END_OF_LIST()
> +    },
> +   .pre_save = vhost_user_fs_pre_save,
>  };
>  
>  static Property vuf_properties[] = {
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 88ecf86ac8..9a229ea884 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -477,6 +477,11 @@
>  #                    will be handled faster.  This is a performance feature and
>  #                    should not affect the correctness of postcopy migration.
>  #                    (since 7.1)
> +# @vhost-user-fs: If enabled, the migration process will allow migration of
> +#                 vhost-user-fs devices, this should be enabled only when
> +#                 backend can preserve local FUSE state e.g. for qemu update
> +#                 when dst reconects to the same endpoints after migration.
> +#                 (since 8.0)
>  #
>  # Features:
>  # @unstable: Members @x-colo and @x-ignore-shared are experimental.
> @@ -492,7 +497,7 @@
>             'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
>             { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>             'validate-uuid', 'background-snapshot',
> -           'zero-copy-send', 'postcopy-preempt'] }
> +           'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] }

I kind of dislike that it's such a specific flag. Is only vhost-user-fs
ever going to be affected? Any way to put it in a way that is more generic?


>  ##
>  # @MigrationCapabilityStatus:
> -- 
> 2.34.1

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Anton Kuchin 3 years ago

On 19/01/2023 14:51, Michael S. Tsirkin wrote:
> On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:
>> Now any vhost-user-fs device makes VM unmigratable, that also prevents
>> qemu update without stopping the VM. In most cases that makes sense
>> because qemu has no way to transfer FUSE session state.
>>
>> But we can give an option to orchestrator to override this if it can
>> guarantee that state will be preserved (e.g. it uses migration to
>> update qemu and dst will run on the same host as src and use the same
>> socket endpoints).
>>
>> This patch keeps default behavior that prevents migration with such devices
>> but adds migration capability 'vhost-user-fs' to explicitly allow migration.
>>
>> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
>> ---
>>   hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
>>   qapi/migration.json       |  7 ++++++-
>>   2 files changed, 30 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
>> index f5049735ac..13d920423e 100644
>> --- a/hw/virtio/vhost-user-fs.c
>> +++ b/hw/virtio/vhost-user-fs.c
>> @@ -24,6 +24,7 @@
>>   #include "hw/virtio/vhost-user-fs.h"
>>   #include "monitor/monitor.h"
>>   #include "sysemu/sysemu.h"
>> +#include "migration/migration.h"
>>   
>>   static const int user_feature_bits[] = {
>>       VIRTIO_F_VERSION_1,
>> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
>>       return &fs->vhost_dev;
>>   }
>>   
>> +static int vhost_user_fs_pre_save(void *opaque)
>> +{
>> +    MigrationState *s = migrate_get_current();
>> +
>> +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
>> +        error_report("Migration of vhost-user-fs devices requires internal FUSE "
>> +                     "state of backend to be preserved. If orchestrator can "
>> +                     "guarantee this (e.g. dst connects to the same backend "
>> +                     "instance or backend state is migrated) set 'vhost-user-fs' "
>> +                     "migration capability to true to enable migration.");
> Isn't it possible that some backends are same and some are not?
> Shouldn't this be a device property then?
If some are not the same it is not guaranteed that correct FUSE
state is present there, so orchestrator shouldn't set the capability
because this can result in destination devices being broken (they'll
be fine after the remount in guest, but this is guest visible and is
not acceptable).

I can imagine smart orchestrator and backend that can transfer
internal FUSE state, but we are not there yet, and this would be
their responsibility then to ensure endpoint compatibility between src
and dst and set the capability (that's why I put "e.g." and "or" in
the error description).

>
>
>
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   static const VMStateDescription vuf_vmstate = {
>>       .name = "vhost-user-fs",
>> -    .unmigratable = 1,
>> +    .minimum_version_id = 0,
>> +    .version_id = 0,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_VIRTIO_DEVICE,
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +   .pre_save = vhost_user_fs_pre_save,
>>   };
>>   
>>   static Property vuf_properties[] = {
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 88ecf86ac8..9a229ea884 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -477,6 +477,11 @@
>>   #                    will be handled faster.  This is a performance feature and
>>   #                    should not affect the correctness of postcopy migration.
>>   #                    (since 7.1)
>> +# @vhost-user-fs: If enabled, the migration process will allow migration of
>> +#                 vhost-user-fs devices, this should be enabled only when
>> +#                 backend can preserve local FUSE state e.g. for qemu update
>> +#                 when dst reconects to the same endpoints after migration.
>> +#                 (since 8.0)
>>   #
>>   # Features:
>>   # @unstable: Members @x-colo and @x-ignore-shared are experimental.
>> @@ -492,7 +497,7 @@
>>              'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
>>              { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>>              'validate-uuid', 'background-snapshot',
>> -           'zero-copy-send', 'postcopy-preempt'] }
>> +           'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] }
> I kind of dislike that it's such a specific flag. Is only vhost-user-fs
> ever going to be affected? Any way to put it in a way that is more generic?
Here I agree with you: I would prefer less narrow naming too. But I
didn't manage to come up with one. Looks like many other vhost-user
devices could benefit from this so maybe "vhost-user-stateless" or
something like this would be better.
I'm not sure that other types of devices could handle reconnect to
the old endpoint as easy as vhost-user-fs, but anyway the support for
this flag needs to be implemented for each device individually.
What do you think? Any ideas would be appreciated.

>
>
>>   ##
>>   # @MigrationCapabilityStatus:
>> -- 
>> 2.34.1

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Michael S. Tsirkin 3 years ago

On Thu, Jan 19, 2023 at 03:45:06PM +0200, Anton Kuchin wrote:
> On 19/01/2023 14:51, Michael S. Tsirkin wrote:
> > On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:
> > > Now any vhost-user-fs device makes VM unmigratable, that also prevents
> > > qemu update without stopping the VM. In most cases that makes sense
> > > because qemu has no way to transfer FUSE session state.
> > > 
> > > But we can give an option to orchestrator to override this if it can
> > > guarantee that state will be preserved (e.g. it uses migration to
> > > update qemu and dst will run on the same host as src and use the same
> > > socket endpoints).
> > > 
> > > This patch keeps default behavior that prevents migration with such devices
> > > but adds migration capability 'vhost-user-fs' to explicitly allow migration.
> > > 
> > > Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
> > > ---
> > >   hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
> > >   qapi/migration.json       |  7 ++++++-
> > >   2 files changed, 30 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > > index f5049735ac..13d920423e 100644
> > > --- a/hw/virtio/vhost-user-fs.c
> > > +++ b/hw/virtio/vhost-user-fs.c
> > > @@ -24,6 +24,7 @@
> > >   #include "hw/virtio/vhost-user-fs.h"
> > >   #include "monitor/monitor.h"
> > >   #include "sysemu/sysemu.h"
> > > +#include "migration/migration.h"
> > >   static const int user_feature_bits[] = {
> > >       VIRTIO_F_VERSION_1,
> > > @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
> > >       return &fs->vhost_dev;
> > >   }
> > > +static int vhost_user_fs_pre_save(void *opaque)
> > > +{
> > > +    MigrationState *s = migrate_get_current();
> > > +
> > > +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
> > > +        error_report("Migration of vhost-user-fs devices requires internal FUSE "
> > > +                     "state of backend to be preserved. If orchestrator can "
> > > +                     "guarantee this (e.g. dst connects to the same backend "
> > > +                     "instance or backend state is migrated) set 'vhost-user-fs' "
> > > +                     "migration capability to true to enable migration.");
> > Isn't it possible that some backends are same and some are not?
> > Shouldn't this be a device property then?
> If some are not the same it is not guaranteed that correct FUSE
> state is present there, so orchestrator shouldn't set the capability
> because this can result in destination devices being broken (they'll
> be fine after the remount in guest, but this is guest visible and is
> not acceptable).
> 
> I can imagine smart orchestrator and backend that can transfer
> internal FUSE state, but we are not there yet, and this would be
> their responsibility then to ensure endpoint compatibility between src
> and dst and set the capability (that's why I put "e.g." and "or" in
> the error description).

So instead of relying on the orchestrator how about making it a device
property?


> > 
> > 
> > 
> > > +        return -1;
> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > >   static const VMStateDescription vuf_vmstate = {
> > >       .name = "vhost-user-fs",
> > > -    .unmigratable = 1,
> > > +    .minimum_version_id = 0,
> > > +    .version_id = 0,
> > > +    .fields = (VMStateField[]) {
> > > +        VMSTATE_VIRTIO_DEVICE,
> > > +        VMSTATE_END_OF_LIST()
> > > +    },
> > > +   .pre_save = vhost_user_fs_pre_save,
> > >   };
> > >   static Property vuf_properties[] = {
> > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > index 88ecf86ac8..9a229ea884 100644
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -477,6 +477,11 @@
> > >   #                    will be handled faster.  This is a performance feature and
> > >   #                    should not affect the correctness of postcopy migration.
> > >   #                    (since 7.1)
> > > +# @vhost-user-fs: If enabled, the migration process will allow migration of
> > > +#                 vhost-user-fs devices, this should be enabled only when
> > > +#                 backend can preserve local FUSE state e.g. for qemu update
> > > +#                 when dst reconects to the same endpoints after migration.
> > > +#                 (since 8.0)
> > >   #
> > >   # Features:
> > >   # @unstable: Members @x-colo and @x-ignore-shared are experimental.
> > > @@ -492,7 +497,7 @@
> > >              'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
> > >              { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
> > >              'validate-uuid', 'background-snapshot',
> > > -           'zero-copy-send', 'postcopy-preempt'] }
> > > +           'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] }
> > I kind of dislike that it's such a specific flag. Is only vhost-user-fs
> > ever going to be affected? Any way to put it in a way that is more generic?
> Here I agree with you: I would prefer less narrow naming too. But I
> didn't manage to come up with one. Looks like many other vhost-user
> devices could benefit from this so maybe "vhost-user-stateless" or
> something like this would be better.
> I'm not sure that other types of devices could handle reconnect to
> the old endpoint as easy as vhost-user-fs, but anyway the support for
> this flag needs to be implemented for each device individually.
> What do you think? Any ideas would be appreciated.

Let's try to create a better description of when this flag should be
set. Then shorten it up to create the name.

> > 
> > 
> > >   ##
> > >   # @MigrationCapabilityStatus:
> > > -- 
> > > 2.34.1

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Anton Kuchin 3 years ago

On 20/01/2023 15:58, Michael S. Tsirkin wrote:
> On Thu, Jan 19, 2023 at 03:45:06PM +0200, Anton Kuchin wrote:
>> On 19/01/2023 14:51, Michael S. Tsirkin wrote:
>>> On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:
>>>> Now any vhost-user-fs device makes VM unmigratable, that also prevents
>>>> qemu update without stopping the VM. In most cases that makes sense
>>>> because qemu has no way to transfer FUSE session state.
>>>>
>>>> But we can give an option to orchestrator to override this if it can
>>>> guarantee that state will be preserved (e.g. it uses migration to
>>>> update qemu and dst will run on the same host as src and use the same
>>>> socket endpoints).
>>>>
>>>> This patch keeps default behavior that prevents migration with such devices
>>>> but adds migration capability 'vhost-user-fs' to explicitly allow migration.
>>>>
>>>> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
>>>> ---
>>>>    hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
>>>>    qapi/migration.json       |  7 ++++++-
>>>>    2 files changed, 30 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
>>>> index f5049735ac..13d920423e 100644
>>>> --- a/hw/virtio/vhost-user-fs.c
>>>> +++ b/hw/virtio/vhost-user-fs.c
>>>> @@ -24,6 +24,7 @@
>>>>    #include "hw/virtio/vhost-user-fs.h"
>>>>    #include "monitor/monitor.h"
>>>>    #include "sysemu/sysemu.h"
>>>> +#include "migration/migration.h"
>>>>    static const int user_feature_bits[] = {
>>>>        VIRTIO_F_VERSION_1,
>>>> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
>>>>        return &fs->vhost_dev;
>>>>    }
>>>> +static int vhost_user_fs_pre_save(void *opaque)
>>>> +{
>>>> +    MigrationState *s = migrate_get_current();
>>>> +
>>>> +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
>>>> +        error_report("Migration of vhost-user-fs devices requires internal FUSE "
>>>> +                     "state of backend to be preserved. If orchestrator can "
>>>> +                     "guarantee this (e.g. dst connects to the same backend "
>>>> +                     "instance or backend state is migrated) set 'vhost-user-fs' "
>>>> +                     "migration capability to true to enable migration.");
>>> Isn't it possible that some backends are same and some are not?
>>> Shouldn't this be a device property then?
>> If some are not the same it is not guaranteed that correct FUSE
>> state is present there, so orchestrator shouldn't set the capability
>> because this can result in destination devices being broken (they'll
>> be fine after the remount in guest, but this is guest visible and is
>> not acceptable).
>>
>> I can imagine smart orchestrator and backend that can transfer
>> internal FUSE state, but we are not there yet, and this would be
>> their responsibility then to ensure endpoint compatibility between src
>> and dst and set the capability (that's why I put "e.g." and "or" in
>> the error description).
> So instead of relying on the orchestrator how about making it a device
> property?

We have to rely on the orchestrator here and I can't see how a property
can help us with this: same device can be migratable or not depending
on if the destination is the same host, what features backend supports,
how management software works and other factors of environment that
are not accessible from qemu or backend daemon.
So in the end we'll end up with orchestrator having to setup flags for
each device before each migration based on information only it can
have - in my opinion this is worse than just giving the orchestrator
a single flag that it can set after calculating the decision for
the particular migration that it organizes.

>
>
>>>
>>>
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>>    static const VMStateDescription vuf_vmstate = {
>>>>        .name = "vhost-user-fs",
>>>> -    .unmigratable = 1,
>>>> +    .minimum_version_id = 0,
>>>> +    .version_id = 0,
>>>> +    .fields = (VMStateField[]) {
>>>> +        VMSTATE_VIRTIO_DEVICE,
>>>> +        VMSTATE_END_OF_LIST()
>>>> +    },
>>>> +   .pre_save = vhost_user_fs_pre_save,
>>>>    };
>>>>    static Property vuf_properties[] = {
>>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>>> index 88ecf86ac8..9a229ea884 100644
>>>> --- a/qapi/migration.json
>>>> +++ b/qapi/migration.json
>>>> @@ -477,6 +477,11 @@
>>>>    #                    will be handled faster.  This is a performance feature and
>>>>    #                    should not affect the correctness of postcopy migration.
>>>>    #                    (since 7.1)
>>>> +# @vhost-user-fs: If enabled, the migration process will allow migration of
>>>> +#                 vhost-user-fs devices, this should be enabled only when
>>>> +#                 backend can preserve local FUSE state e.g. for qemu update
>>>> +#                 when dst reconects to the same endpoints after migration.
>>>> +#                 (since 8.0)
>>>>    #
>>>>    # Features:
>>>>    # @unstable: Members @x-colo and @x-ignore-shared are experimental.
>>>> @@ -492,7 +497,7 @@
>>>>               'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
>>>>               { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>>>>               'validate-uuid', 'background-snapshot',
>>>> -           'zero-copy-send', 'postcopy-preempt'] }
>>>> +           'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] }
>>> I kind of dislike that it's such a specific flag. Is only vhost-user-fs
>>> ever going to be affected? Any way to put it in a way that is more generic?
>> Here I agree with you: I would prefer less narrow naming too. But I
>> didn't manage to come up with one. Looks like many other vhost-user
>> devices could benefit from this so maybe "vhost-user-stateless" or
>> something like this would be better.
>> I'm not sure that other types of devices could handle reconnect to
>> the old endpoint as easy as vhost-user-fs, but anyway the support for
>> this flag needs to be implemented for each device individually.
>> What do you think? Any ideas would be appreciated.
> Let's try to create a better description of when this flag should be
> set. Then shorten it up to create the name.

This flag should be set when qemu don't need to worry about any
external state stored in vhost-user daemons during migration:
don't fail migration, just pack generic virtio device states to
migration stream and orchestrator guarantees that the rest of the
state will be present at the destination to restore full context and
continue running.

>
>>>
>>>>    ##
>>>>    # @MigrationCapabilityStatus:
>>>> -- 
>>>> 2.34.1

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Michael S. Tsirkin 3 years ago

On Fri, Jan 20, 2023 at 07:37:18PM +0200, Anton Kuchin wrote:
> On 20/01/2023 15:58, Michael S. Tsirkin wrote:
> > On Thu, Jan 19, 2023 at 03:45:06PM +0200, Anton Kuchin wrote:
> > > On 19/01/2023 14:51, Michael S. Tsirkin wrote:
> > > > On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:
> > > > > Now any vhost-user-fs device makes VM unmigratable, that also prevents
> > > > > qemu update without stopping the VM. In most cases that makes sense
> > > > > because qemu has no way to transfer FUSE session state.
> > > > > 
> > > > > But we can give an option to orchestrator to override this if it can
> > > > > guarantee that state will be preserved (e.g. it uses migration to
> > > > > update qemu and dst will run on the same host as src and use the same
> > > > > socket endpoints).
> > > > > 
> > > > > This patch keeps default behavior that prevents migration with such devices
> > > > > but adds migration capability 'vhost-user-fs' to explicitly allow migration.
> > > > > 
> > > > > Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
> > > > > ---
> > > > >    hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
> > > > >    qapi/migration.json       |  7 ++++++-
> > > > >    2 files changed, 30 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > > > > index f5049735ac..13d920423e 100644
> > > > > --- a/hw/virtio/vhost-user-fs.c
> > > > > +++ b/hw/virtio/vhost-user-fs.c
> > > > > @@ -24,6 +24,7 @@
> > > > >    #include "hw/virtio/vhost-user-fs.h"
> > > > >    #include "monitor/monitor.h"
> > > > >    #include "sysemu/sysemu.h"
> > > > > +#include "migration/migration.h"
> > > > >    static const int user_feature_bits[] = {
> > > > >        VIRTIO_F_VERSION_1,
> > > > > @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
> > > > >        return &fs->vhost_dev;
> > > > >    }
> > > > > +static int vhost_user_fs_pre_save(void *opaque)
> > > > > +{
> > > > > +    MigrationState *s = migrate_get_current();
> > > > > +
> > > > > +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
> > > > > +        error_report("Migration of vhost-user-fs devices requires internal FUSE "
> > > > > +                     "state of backend to be preserved. If orchestrator can "
> > > > > +                     "guarantee this (e.g. dst connects to the same backend "
> > > > > +                     "instance or backend state is migrated) set 'vhost-user-fs' "
> > > > > +                     "migration capability to true to enable migration.");
> > > > Isn't it possible that some backends are same and some are not?
> > > > Shouldn't this be a device property then?
> > > If some are not the same it is not guaranteed that correct FUSE
> > > state is present there, so orchestrator shouldn't set the capability
> > > because this can result in destination devices being broken (they'll
> > > be fine after the remount in guest, but this is guest visible and is
> > > not acceptable).
> > > 
> > > I can imagine smart orchestrator and backend that can transfer
> > > internal FUSE state, but we are not there yet, and this would be
> > > their responsibility then to ensure endpoint compatibility between src
> > > and dst and set the capability (that's why I put "e.g." and "or" in
> > > the error description).
> > So instead of relying on the orchestrator how about making it a device
> > property?
> 
> We have to rely on the orchestrator here and I can't see how a property
> can help us with this: same device can be migratable or not depending
> on if the destination is the same host, what features backend supports,
> how management software works and other factors of environment that
> are not accessible from qemu or backend daemon.
> So in the end we'll end up with orchestrator having to setup flags for
> each device before each migration based on information only it can
> have - in my opinion this is worse than just giving the orchestrator
> a single flag that it can set after calculating the decision for
> the particular migration that it organizes.
> 
> > 
> > 
> > > > 
> > > > 
> > > > > +        return -1;
> > > > > +    }
> > > > > +
> > > > > +    return 0;
> > > > > +}
> > > > > +
> > > > >    static const VMStateDescription vuf_vmstate = {
> > > > >        .name = "vhost-user-fs",
> > > > > -    .unmigratable = 1,
> > > > > +    .minimum_version_id = 0,
> > > > > +    .version_id = 0,
> > > > > +    .fields = (VMStateField[]) {
> > > > > +        VMSTATE_VIRTIO_DEVICE,
> > > > > +        VMSTATE_END_OF_LIST()
> > > > > +    },
> > > > > +   .pre_save = vhost_user_fs_pre_save,
> > > > >    };
> > > > >    static Property vuf_properties[] = {
> > > > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > > > index 88ecf86ac8..9a229ea884 100644
> > > > > --- a/qapi/migration.json
> > > > > +++ b/qapi/migration.json
> > > > > @@ -477,6 +477,11 @@
> > > > >    #                    will be handled faster.  This is a performance feature and
> > > > >    #                    should not affect the correctness of postcopy migration.
> > > > >    #                    (since 7.1)
> > > > > +# @vhost-user-fs: If enabled, the migration process will allow migration of
> > > > > +#                 vhost-user-fs devices, this should be enabled only when
> > > > > +#                 backend can preserve local FUSE state e.g. for qemu update
> > > > > +#                 when dst reconects to the same endpoints after migration.
> > > > > +#                 (since 8.0)
> > > > >    #
> > > > >    # Features:
> > > > >    # @unstable: Members @x-colo and @x-ignore-shared are experimental.
> > > > > @@ -492,7 +497,7 @@
> > > > >               'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
> > > > >               { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
> > > > >               'validate-uuid', 'background-snapshot',
> > > > > -           'zero-copy-send', 'postcopy-preempt'] }
> > > > > +           'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] }
> > > > I kind of dislike that it's such a specific flag. Is only vhost-user-fs
> > > > ever going to be affected? Any way to put it in a way that is more generic?
> > > Here I agree with you: I would prefer less narrow naming too. But I
> > > didn't manage to come up with one. Looks like many other vhost-user
> > > devices could benefit from this so maybe "vhost-user-stateless" or
> > > something like this would be better.
> > > I'm not sure that other types of devices could handle reconnect to
> > > the old endpoint as easy as vhost-user-fs, but anyway the support for
> > > this flag needs to be implemented for each device individually.
> > > What do you think? Any ideas would be appreciated.
> > Let's try to create a better description of when this flag should be
> > set. Then shorten it up to create the name.
> 
> This flag should be set when qemu don't need to worry about any
> external state stored in vhost-user daemons during migration:
> don't fail migration, just pack generic virtio device states to
> migration stream and orchestrator guarantees that the rest of the
> state will be present at the destination to restore full context and
> continue running.

Sorry  I still do not get it.  So fundamentally, why do we need this property?
vhost-user-fs is not created by default that we'd then need opt-in to
the special "migrateable" case.
That's why I said it might make some sense as a device property as qemu
tracks whether device is unplugged for us.

But as written, if you are going to teach the orchestrator about
vhost-user-fs and its special needs, just teach it when to migrate and
where not to migrate.

Either we describe the special situation to qemu and let qemu
make an intelligent decision whether to allow migration,
or we trust the orchestrator. And if it's the latter, then 'migrate'
already says orchestrator decided to migrate.

> > 
> > > > 
> > > > >    ##
> > > > >    # @MigrationCapabilityStatus:
> > > > > -- 
> > > > > 2.34.1

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Anton Kuchin 3 years ago

On 22/01/2023 10:16, Michael S. Tsirkin wrote:
> On Fri, Jan 20, 2023 at 07:37:18PM +0200, Anton Kuchin wrote:
>> On 20/01/2023 15:58, Michael S. Tsirkin wrote:
>>> On Thu, Jan 19, 2023 at 03:45:06PM +0200, Anton Kuchin wrote:
>>>> On 19/01/2023 14:51, Michael S. Tsirkin wrote:
>>>>> On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:
>>>>>> Now any vhost-user-fs device makes VM unmigratable, that also prevents
>>>>>> qemu update without stopping the VM. In most cases that makes sense
>>>>>> because qemu has no way to transfer FUSE session state.
>>>>>>
>>>>>> But we can give an option to orchestrator to override this if it can
>>>>>> guarantee that state will be preserved (e.g. it uses migration to
>>>>>> update qemu and dst will run on the same host as src and use the same
>>>>>> socket endpoints).
>>>>>>
>>>>>> This patch keeps default behavior that prevents migration with such devices
>>>>>> but adds migration capability 'vhost-user-fs' to explicitly allow migration.
>>>>>>
>>>>>> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
>>>>>> ---
>>>>>>     hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
>>>>>>     qapi/migration.json       |  7 ++++++-
>>>>>>     2 files changed, 30 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
>>>>>> index f5049735ac..13d920423e 100644
>>>>>> --- a/hw/virtio/vhost-user-fs.c
>>>>>> +++ b/hw/virtio/vhost-user-fs.c
>>>>>> @@ -24,6 +24,7 @@
>>>>>>     #include "hw/virtio/vhost-user-fs.h"
>>>>>>     #include "monitor/monitor.h"
>>>>>>     #include "sysemu/sysemu.h"
>>>>>> +#include "migration/migration.h"
>>>>>>     static const int user_feature_bits[] = {
>>>>>>         VIRTIO_F_VERSION_1,
>>>>>> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
>>>>>>         return &fs->vhost_dev;
>>>>>>     }
>>>>>> +static int vhost_user_fs_pre_save(void *opaque)
>>>>>> +{
>>>>>> +    MigrationState *s = migrate_get_current();
>>>>>> +
>>>>>> +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
>>>>>> +        error_report("Migration of vhost-user-fs devices requires internal FUSE "
>>>>>> +                     "state of backend to be preserved. If orchestrator can "
>>>>>> +                     "guarantee this (e.g. dst connects to the same backend "
>>>>>> +                     "instance or backend state is migrated) set 'vhost-user-fs' "
>>>>>> +                     "migration capability to true to enable migration.");
>>>>> Isn't it possible that some backends are same and some are not?
>>>>> Shouldn't this be a device property then?
>>>> If some are not the same it is not guaranteed that correct FUSE
>>>> state is present there, so orchestrator shouldn't set the capability
>>>> because this can result in destination devices being broken (they'll
>>>> be fine after the remount in guest, but this is guest visible and is
>>>> not acceptable).
>>>>
>>>> I can imagine smart orchestrator and backend that can transfer
>>>> internal FUSE state, but we are not there yet, and this would be
>>>> their responsibility then to ensure endpoint compatibility between src
>>>> and dst and set the capability (that's why I put "e.g." and "or" in
>>>> the error description).
>>> So instead of relying on the orchestrator how about making it a device
>>> property?
>> We have to rely on the orchestrator here and I can't see how a property
>> can help us with this: same device can be migratable or not depending
>> on if the destination is the same host, what features backend supports,
>> how management software works and other factors of environment that
>> are not accessible from qemu or backend daemon.
>> So in the end we'll end up with orchestrator having to setup flags for
>> each device before each migration based on information only it can
>> have - in my opinion this is worse than just giving the orchestrator
>> a single flag that it can set after calculating the decision for
>> the particular migration that it organizes.
>>
>>>
>>>>>
>>>>>> +        return -1;
>>>>>> +    }
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>>     static const VMStateDescription vuf_vmstate = {
>>>>>>         .name = "vhost-user-fs",
>>>>>> -    .unmigratable = 1,
>>>>>> +    .minimum_version_id = 0,
>>>>>> +    .version_id = 0,
>>>>>> +    .fields = (VMStateField[]) {
>>>>>> +        VMSTATE_VIRTIO_DEVICE,
>>>>>> +        VMSTATE_END_OF_LIST()
>>>>>> +    },
>>>>>> +   .pre_save = vhost_user_fs_pre_save,
>>>>>>     };
>>>>>>     static Property vuf_properties[] = {
>>>>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>>>>> index 88ecf86ac8..9a229ea884 100644
>>>>>> --- a/qapi/migration.json
>>>>>> +++ b/qapi/migration.json
>>>>>> @@ -477,6 +477,11 @@
>>>>>>     #                    will be handled faster.  This is a performance feature and
>>>>>>     #                    should not affect the correctness of postcopy migration.
>>>>>>     #                    (since 7.1)
>>>>>> +# @vhost-user-fs: If enabled, the migration process will allow migration of
>>>>>> +#                 vhost-user-fs devices, this should be enabled only when
>>>>>> +#                 backend can preserve local FUSE state e.g. for qemu update
>>>>>> +#                 when dst reconects to the same endpoints after migration.
>>>>>> +#                 (since 8.0)
>>>>>>     #
>>>>>>     # Features:
>>>>>>     # @unstable: Members @x-colo and @x-ignore-shared are experimental.
>>>>>> @@ -492,7 +497,7 @@
>>>>>>                'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
>>>>>>                { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>>>>>>                'validate-uuid', 'background-snapshot',
>>>>>> -           'zero-copy-send', 'postcopy-preempt'] }
>>>>>> +           'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] }
>>>>> I kind of dislike that it's such a specific flag. Is only vhost-user-fs
>>>>> ever going to be affected? Any way to put it in a way that is more generic?
>>>> Here I agree with you: I would prefer less narrow naming too. But I
>>>> didn't manage to come up with one. Looks like many other vhost-user
>>>> devices could benefit from this so maybe "vhost-user-stateless" or
>>>> something like this would be better.
>>>> I'm not sure that other types of devices could handle reconnect to
>>>> the old endpoint as easy as vhost-user-fs, but anyway the support for
>>>> this flag needs to be implemented for each device individually.
>>>> What do you think? Any ideas would be appreciated.
>>> Let's try to create a better description of when this flag should be
>>> set. Then shorten it up to create the name.
>> This flag should be set when qemu don't need to worry about any
>> external state stored in vhost-user daemons during migration:
>> don't fail migration, just pack generic virtio device states to
>> migration stream and orchestrator guarantees that the rest of the
>> state will be present at the destination to restore full context and
>> continue running.
> Sorry  I still do not get it.  So fundamentally, why do we need this property?
> vhost-user-fs is not created by default that we'd then need opt-in to
> the special "migrateable" case.
> That's why I said it might make some sense as a device property as qemu
> tracks whether device is unplugged for us.
>
> But as written, if you are going to teach the orchestrator about
> vhost-user-fs and its special needs, just teach it when to migrate and
> where not to migrate.
>
> Either we describe the special situation to qemu and let qemu
> make an intelligent decision whether to allow migration,
> or we trust the orchestrator. And if it's the latter, then 'migrate'
> already says orchestrator decided to migrate.

The problem I'm trying to solve is that most of vhost-user devices
now block migration in qemu. And this makes sense since qemu can't
extract and transfer backend daemon state. But this prevents us from
updating qemu executable via local migration. So this flag is
intended more as a safety check that says "I know what I'm doing".

I agree that it is not really necessary if we trust the orchestrator
to request migration only when the migration can be performed in a
safe way. But changing the current behavior of vhost-user-fs from
"always blocks migration" to "migrates partial state whenever
orchestrator requests it" seems a little  dangerous and can be
misinterpreted as full support for migration in all cases.

>
>>>>>>     ##
>>>>>>     # @MigrationCapabilityStatus:
>>>>>> -- 
>>>>>> 2.34.1

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Michael S. Tsirkin 3 years ago

On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote:
> > > This flag should be set when qemu don't need to worry about any
> > > external state stored in vhost-user daemons during migration:
> > > don't fail migration, just pack generic virtio device states to
> > > migration stream and orchestrator guarantees that the rest of the
> > > state will be present at the destination to restore full context and
> > > continue running.
> > Sorry  I still do not get it.  So fundamentally, why do we need this property?
> > vhost-user-fs is not created by default that we'd then need opt-in to
> > the special "migrateable" case.
> > That's why I said it might make some sense as a device property as qemu
> > tracks whether device is unplugged for us.
> > 
> > But as written, if you are going to teach the orchestrator about
> > vhost-user-fs and its special needs, just teach it when to migrate and
> > where not to migrate.
> > 
> > Either we describe the special situation to qemu and let qemu
> > make an intelligent decision whether to allow migration,
> > or we trust the orchestrator. And if it's the latter, then 'migrate'
> > already says orchestrator decided to migrate.
> 
> The problem I'm trying to solve is that most of vhost-user devices
> now block migration in qemu. And this makes sense since qemu can't
> extract and transfer backend daemon state. But this prevents us from
> updating qemu executable via local migration. So this flag is
> intended more as a safety check that says "I know what I'm doing".
> 
> I agree that it is not really necessary if we trust the orchestrator
> to request migration only when the migration can be performed in a
> safe way. But changing the current behavior of vhost-user-fs from
> "always blocks migration" to "migrates partial state whenever
> orchestrator requests it" seems a little  dangerous and can be
> misinterpreted as full support for migration in all cases.

It's not really different from block is it? orchestrator has to arrange
for backend migration. I think we just assumed there's no use-case where
this is practical for vhost-user-fs so we blocked it.
But in any case it's orchestrator's responsibility.

-- 
MST

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Anton Kuchin 3 years ago

On 22/01/2023 16:46, Michael S. Tsirkin wrote:
> On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote:
>>>> This flag should be set when qemu don't need to worry about any
>>>> external state stored in vhost-user daemons during migration:
>>>> don't fail migration, just pack generic virtio device states to
>>>> migration stream and orchestrator guarantees that the rest of the
>>>> state will be present at the destination to restore full context and
>>>> continue running.
>>> Sorry  I still do not get it.  So fundamentally, why do we need this property?
>>> vhost-user-fs is not created by default that we'd then need opt-in to
>>> the special "migrateable" case.
>>> That's why I said it might make some sense as a device property as qemu
>>> tracks whether device is unplugged for us.
>>>
>>> But as written, if you are going to teach the orchestrator about
>>> vhost-user-fs and its special needs, just teach it when to migrate and
>>> where not to migrate.
>>>
>>> Either we describe the special situation to qemu and let qemu
>>> make an intelligent decision whether to allow migration,
>>> or we trust the orchestrator. And if it's the latter, then 'migrate'
>>> already says orchestrator decided to migrate.
>> The problem I'm trying to solve is that most of vhost-user devices
>> now block migration in qemu. And this makes sense since qemu can't
>> extract and transfer backend daemon state. But this prevents us from
>> updating qemu executable via local migration. So this flag is
>> intended more as a safety check that says "I know what I'm doing".
>>
>> I agree that it is not really necessary if we trust the orchestrator
>> to request migration only when the migration can be performed in a
>> safe way. But changing the current behavior of vhost-user-fs from
>> "always blocks migration" to "migrates partial state whenever
>> orchestrator requests it" seems a little  dangerous and can be
>> misinterpreted as full support for migration in all cases.
> It's not really different from block is it? orchestrator has to arrange
> for backend migration. I think we just assumed there's no use-case where
> this is practical for vhost-user-fs so we blocked it.
> But in any case it's orchestrator's responsibility.

Yes, you are right. So do you think we should just drop the blocker
without adding a new flag?

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Michael S. Tsirkin 3 years ago

On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote:
> 
> On 22/01/2023 16:46, Michael S. Tsirkin wrote:
> > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote:
> > > > > This flag should be set when qemu don't need to worry about any
> > > > > external state stored in vhost-user daemons during migration:
> > > > > don't fail migration, just pack generic virtio device states to
> > > > > migration stream and orchestrator guarantees that the rest of the
> > > > > state will be present at the destination to restore full context and
> > > > > continue running.
> > > > Sorry  I still do not get it.  So fundamentally, why do we need this property?
> > > > vhost-user-fs is not created by default that we'd then need opt-in to
> > > > the special "migrateable" case.
> > > > That's why I said it might make some sense as a device property as qemu
> > > > tracks whether device is unplugged for us.
> > > > 
> > > > But as written, if you are going to teach the orchestrator about
> > > > vhost-user-fs and its special needs, just teach it when to migrate and
> > > > where not to migrate.
> > > > 
> > > > Either we describe the special situation to qemu and let qemu
> > > > make an intelligent decision whether to allow migration,
> > > > or we trust the orchestrator. And if it's the latter, then 'migrate'
> > > > already says orchestrator decided to migrate.
> > > The problem I'm trying to solve is that most of vhost-user devices
> > > now block migration in qemu. And this makes sense since qemu can't
> > > extract and transfer backend daemon state. But this prevents us from
> > > updating qemu executable via local migration. So this flag is
> > > intended more as a safety check that says "I know what I'm doing".
> > > 
> > > I agree that it is not really necessary if we trust the orchestrator
> > > to request migration only when the migration can be performed in a
> > > safe way. But changing the current behavior of vhost-user-fs from
> > > "always blocks migration" to "migrates partial state whenever
> > > orchestrator requests it" seems a little  dangerous and can be
> > > misinterpreted as full support for migration in all cases.
> > It's not really different from block is it? orchestrator has to arrange
> > for backend migration. I think we just assumed there's no use-case where
> > this is practical for vhost-user-fs so we blocked it.
> > But in any case it's orchestrator's responsibility.
> 
> Yes, you are right. So do you think we should just drop the blocker
> without adding a new flag?

I'd be inclined to. I am curious what do dgilbert and stefanha think though.

-- 
MST

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Dr. David Alan Gilbert 3 years ago

* Michael S. Tsirkin (mst@redhat.com) wrote:
> On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote:
> > 
> > On 22/01/2023 16:46, Michael S. Tsirkin wrote:
> > > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote:
> > > > > > This flag should be set when qemu don't need to worry about any
> > > > > > external state stored in vhost-user daemons during migration:
> > > > > > don't fail migration, just pack generic virtio device states to
> > > > > > migration stream and orchestrator guarantees that the rest of the
> > > > > > state will be present at the destination to restore full context and
> > > > > > continue running.
> > > > > Sorry  I still do not get it.  So fundamentally, why do we need this property?
> > > > > vhost-user-fs is not created by default that we'd then need opt-in to
> > > > > the special "migrateable" case.
> > > > > That's why I said it might make some sense as a device property as qemu
> > > > > tracks whether device is unplugged for us.
> > > > > 
> > > > > But as written, if you are going to teach the orchestrator about
> > > > > vhost-user-fs and its special needs, just teach it when to migrate and
> > > > > where not to migrate.
> > > > > 
> > > > > Either we describe the special situation to qemu and let qemu
> > > > > make an intelligent decision whether to allow migration,
> > > > > or we trust the orchestrator. And if it's the latter, then 'migrate'
> > > > > already says orchestrator decided to migrate.
> > > > The problem I'm trying to solve is that most of vhost-user devices
> > > > now block migration in qemu. And this makes sense since qemu can't
> > > > extract and transfer backend daemon state. But this prevents us from
> > > > updating qemu executable via local migration. So this flag is
> > > > intended more as a safety check that says "I know what I'm doing".
> > > > 
> > > > I agree that it is not really necessary if we trust the orchestrator
> > > > to request migration only when the migration can be performed in a
> > > > safe way. But changing the current behavior of vhost-user-fs from
> > > > "always blocks migration" to "migrates partial state whenever
> > > > orchestrator requests it" seems a little  dangerous and can be
> > > > misinterpreted as full support for migration in all cases.
> > > It's not really different from block is it? orchestrator has to arrange
> > > for backend migration. I think we just assumed there's no use-case where
> > > this is practical for vhost-user-fs so we blocked it.
> > > But in any case it's orchestrator's responsibility.
> > 
> > Yes, you are right. So do you think we should just drop the blocker
> > without adding a new flag?
> 
> I'd be inclined to. I am curious what do dgilbert and stefanha think though.

Yes I think that's probably OK, as long as we use the flag for knowing
how to handle the discard bitmap as a proxy for the daemon knowing how
to handle *some* migrations; knowing which migrations is then the job
for the orchestrator to be careful of.

Dave

> -- 
> MST
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Stefan Hajnoczi 3 years ago

On Mon, Jan 23, 2023 at 06:27:23PM +0000, Dr. David Alan Gilbert wrote:
> * Michael S. Tsirkin (mst@redhat.com) wrote:
> > On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote:
> > > 
> > > On 22/01/2023 16:46, Michael S. Tsirkin wrote:
> > > > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote:
> > > > > > > This flag should be set when qemu don't need to worry about any
> > > > > > > external state stored in vhost-user daemons during migration:
> > > > > > > don't fail migration, just pack generic virtio device states to
> > > > > > > migration stream and orchestrator guarantees that the rest of the
> > > > > > > state will be present at the destination to restore full context and
> > > > > > > continue running.
> > > > > > Sorry  I still do not get it.  So fundamentally, why do we need this property?
> > > > > > vhost-user-fs is not created by default that we'd then need opt-in to
> > > > > > the special "migrateable" case.
> > > > > > That's why I said it might make some sense as a device property as qemu
> > > > > > tracks whether device is unplugged for us.
> > > > > > 
> > > > > > But as written, if you are going to teach the orchestrator about
> > > > > > vhost-user-fs and its special needs, just teach it when to migrate and
> > > > > > where not to migrate.
> > > > > > 
> > > > > > Either we describe the special situation to qemu and let qemu
> > > > > > make an intelligent decision whether to allow migration,
> > > > > > or we trust the orchestrator. And if it's the latter, then 'migrate'
> > > > > > already says orchestrator decided to migrate.
> > > > > The problem I'm trying to solve is that most of vhost-user devices
> > > > > now block migration in qemu. And this makes sense since qemu can't
> > > > > extract and transfer backend daemon state. But this prevents us from
> > > > > updating qemu executable via local migration. So this flag is
> > > > > intended more as a safety check that says "I know what I'm doing".
> > > > > 
> > > > > I agree that it is not really necessary if we trust the orchestrator
> > > > > to request migration only when the migration can be performed in a
> > > > > safe way. But changing the current behavior of vhost-user-fs from
> > > > > "always blocks migration" to "migrates partial state whenever
> > > > > orchestrator requests it" seems a little  dangerous and can be
> > > > > misinterpreted as full support for migration in all cases.
> > > > It's not really different from block is it? orchestrator has to arrange
> > > > for backend migration. I think we just assumed there's no use-case where
> > > > this is practical for vhost-user-fs so we blocked it.
> > > > But in any case it's orchestrator's responsibility.
> > > 
> > > Yes, you are right. So do you think we should just drop the blocker
> > > without adding a new flag?
> > 
> > I'd be inclined to. I am curious what do dgilbert and stefanha think though.
> 
> Yes I think that's probably OK, as long as we use the flag for knowing
> how to handle the discard bitmap as a proxy for the daemon knowing how
> to handle *some* migrations; knowing which migrations is then the job
> for the orchestrator to be careful of.

I think the feature bit is not a good way to detect live migration
support. vhost-user backends typically use libvhost-user, rust-vmm's
vhost-user-backend crate, etc where this feature can be implemented for
free. If the feature bit is advertized we don't know if the device
implementation (net, blk, fs, etc) is aware of migration at all.

Stefan

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Stefan Hajnoczi 3 years ago

On Mon, 23 Jan 2023 at 14:54, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Mon, Jan 23, 2023 at 06:27:23PM +0000, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote:
> > > >
> > > > On 22/01/2023 16:46, Michael S. Tsirkin wrote:
> > > > > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote:
> > > > > > > > This flag should be set when qemu don't need to worry about any
> > > > > > > > external state stored in vhost-user daemons during migration:
> > > > > > > > don't fail migration, just pack generic virtio device states to
> > > > > > > > migration stream and orchestrator guarantees that the rest of the
> > > > > > > > state will be present at the destination to restore full context and
> > > > > > > > continue running.
> > > > > > > Sorry  I still do not get it.  So fundamentally, why do we need this property?
> > > > > > > vhost-user-fs is not created by default that we'd then need opt-in to
> > > > > > > the special "migrateable" case.
> > > > > > > That's why I said it might make some sense as a device property as qemu
> > > > > > > tracks whether device is unplugged for us.
> > > > > > >
> > > > > > > But as written, if you are going to teach the orchestrator about
> > > > > > > vhost-user-fs and its special needs, just teach it when to migrate and
> > > > > > > where not to migrate.
> > > > > > >
> > > > > > > Either we describe the special situation to qemu and let qemu
> > > > > > > make an intelligent decision whether to allow migration,
> > > > > > > or we trust the orchestrator. And if it's the latter, then 'migrate'
> > > > > > > already says orchestrator decided to migrate.
> > > > > > The problem I'm trying to solve is that most of vhost-user devices
> > > > > > now block migration in qemu. And this makes sense since qemu can't
> > > > > > extract and transfer backend daemon state. But this prevents us from
> > > > > > updating qemu executable via local migration. So this flag is
> > > > > > intended more as a safety check that says "I know what I'm doing".
> > > > > >
> > > > > > I agree that it is not really necessary if we trust the orchestrator
> > > > > > to request migration only when the migration can be performed in a
> > > > > > safe way. But changing the current behavior of vhost-user-fs from
> > > > > > "always blocks migration" to "migrates partial state whenever
> > > > > > orchestrator requests it" seems a little  dangerous and can be
> > > > > > misinterpreted as full support for migration in all cases.
> > > > > It's not really different from block is it? orchestrator has to arrange
> > > > > for backend migration. I think we just assumed there's no use-case where
> > > > > this is practical for vhost-user-fs so we blocked it.
> > > > > But in any case it's orchestrator's responsibility.
> > > >
> > > > Yes, you are right. So do you think we should just drop the blocker
> > > > without adding a new flag?
> > >
> > > I'd be inclined to. I am curious what do dgilbert and stefanha think though.
> >
> > Yes I think that's probably OK, as long as we use the flag for knowing
> > how to handle the discard bitmap as a proxy for the daemon knowing how
> > to handle *some* migrations; knowing which migrations is then the job
> > for the orchestrator to be careful of.
>
> I think the feature bit is not a good way to detect live migration
> support. vhost-user backends typically use libvhost-user, rust-vmm's
> vhost-user-backend crate, etc where this feature can be implemented for
> free. If the feature bit is advertized we don't know if the device
> implementation (net, blk, fs, etc) is aware of migration at all.

I checked how bad the situation is. libvhost-user currently enables
LOG_ALL by default. :(

So I don't think the front-end can use LOG_ALL alone to determine
whether or not migration is supported by the back-end.

There are several existing back-ends based on libvhost-user that have
no concept of reconnection or migration but report the LOG_ALL feature
bit.

Stefan

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Dr. David Alan Gilbert 3 years ago

* Stefan Hajnoczi (stefanha@gmail.com) wrote:
> On Mon, 23 Jan 2023 at 14:54, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Mon, Jan 23, 2023 at 06:27:23PM +0000, Dr. David Alan Gilbert wrote:
> > > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > > On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote:
> > > > >
> > > > > On 22/01/2023 16:46, Michael S. Tsirkin wrote:
> > > > > > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote:
> > > > > > > > > This flag should be set when qemu don't need to worry about any
> > > > > > > > > external state stored in vhost-user daemons during migration:
> > > > > > > > > don't fail migration, just pack generic virtio device states to
> > > > > > > > > migration stream and orchestrator guarantees that the rest of the
> > > > > > > > > state will be present at the destination to restore full context and
> > > > > > > > > continue running.
> > > > > > > > Sorry  I still do not get it.  So fundamentally, why do we need this property?
> > > > > > > > vhost-user-fs is not created by default that we'd then need opt-in to
> > > > > > > > the special "migrateable" case.
> > > > > > > > That's why I said it might make some sense as a device property as qemu
> > > > > > > > tracks whether device is unplugged for us.
> > > > > > > >
> > > > > > > > But as written, if you are going to teach the orchestrator about
> > > > > > > > vhost-user-fs and its special needs, just teach it when to migrate and
> > > > > > > > where not to migrate.
> > > > > > > >
> > > > > > > > Either we describe the special situation to qemu and let qemu
> > > > > > > > make an intelligent decision whether to allow migration,
> > > > > > > > or we trust the orchestrator. And if it's the latter, then 'migrate'
> > > > > > > > already says orchestrator decided to migrate.
> > > > > > > The problem I'm trying to solve is that most of vhost-user devices
> > > > > > > now block migration in qemu. And this makes sense since qemu can't
> > > > > > > extract and transfer backend daemon state. But this prevents us from
> > > > > > > updating qemu executable via local migration. So this flag is
> > > > > > > intended more as a safety check that says "I know what I'm doing".
> > > > > > >
> > > > > > > I agree that it is not really necessary if we trust the orchestrator
> > > > > > > to request migration only when the migration can be performed in a
> > > > > > > safe way. But changing the current behavior of vhost-user-fs from
> > > > > > > "always blocks migration" to "migrates partial state whenever
> > > > > > > orchestrator requests it" seems a little  dangerous and can be
> > > > > > > misinterpreted as full support for migration in all cases.
> > > > > > It's not really different from block is it? orchestrator has to arrange
> > > > > > for backend migration. I think we just assumed there's no use-case where
> > > > > > this is practical for vhost-user-fs so we blocked it.
> > > > > > But in any case it's orchestrator's responsibility.
> > > > >
> > > > > Yes, you are right. So do you think we should just drop the blocker
> > > > > without adding a new flag?
> > > >
> > > > I'd be inclined to. I am curious what do dgilbert and stefanha think though.
> > >
> > > Yes I think that's probably OK, as long as we use the flag for knowing
> > > how to handle the discard bitmap as a proxy for the daemon knowing how
> > > to handle *some* migrations; knowing which migrations is then the job
> > > for the orchestrator to be careful of.
> >
> > I think the feature bit is not a good way to detect live migration
> > support. vhost-user backends typically use libvhost-user, rust-vmm's
> > vhost-user-backend crate, etc where this feature can be implemented for
> > free. If the feature bit is advertized we don't know if the device
> > implementation (net, blk, fs, etc) is aware of migration at all.
> 
> I checked how bad the situation is. libvhost-user currently enables
> LOG_ALL by default. :(
> 
> So I don't think the front-end can use LOG_ALL alone to determine
> whether or not migration is supported by the back-end.
> 
> There are several existing back-ends based on libvhost-user that have
> no concept of reconnection or migration but report the LOG_ALL feature
> bit.

Ouch, yes that's messy.

Going back to the original question; I don't think a command line flag
will work though, because even for a given VM there's the possibility
of some (local) migrations working but other (remote) migrations not
working; so you don't know at the point you start the VM whether
your migrations are going to work.

Dave

> Stefan
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Stefan Hajnoczi 3 years ago

On Tue, Jan 24, 2023, 04:50 Dr. David Alan Gilbert <dgilbert@redhat.com>
wrote:

> * Stefan Hajnoczi (stefanha@gmail.com) wrote:
> > On Mon, 23 Jan 2023 at 14:54, Stefan Hajnoczi <stefanha@redhat.com>
> wrote:
> > >
> > > On Mon, Jan 23, 2023 at 06:27:23PM +0000, Dr. David Alan Gilbert wrote:
> > > > * Michael S. Tsirkin (mst@redhat.com) wrote:
> > > > > On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote:
> > > > > >
> > > > > > On 22/01/2023 16:46, Michael S. Tsirkin wrote:
> > > > > > > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote:
> > > > > > > > > > This flag should be set when qemu don't need to worry
> about any
> > > > > > > > > > external state stored in vhost-user daemons during
> migration:
> > > > > > > > > > don't fail migration, just pack generic virtio device
> states to
> > > > > > > > > > migration stream and orchestrator guarantees that the
> rest of the
> > > > > > > > > > state will be present at the destination to restore full
> context and
> > > > > > > > > > continue running.
> > > > > > > > > Sorry  I still do not get it.  So fundamentally, why do we
> need this property?
> > > > > > > > > vhost-user-fs is not created by default that we'd then
> need opt-in to
> > > > > > > > > the special "migrateable" case.
> > > > > > > > > That's why I said it might make some sense as a device
> property as qemu
> > > > > > > > > tracks whether device is unplugged for us.
> > > > > > > > >
> > > > > > > > > But as written, if you are going to teach the orchestrator
> about
> > > > > > > > > vhost-user-fs and its special needs, just teach it when to
> migrate and
> > > > > > > > > where not to migrate.
> > > > > > > > >
> > > > > > > > > Either we describe the special situation to qemu and let
> qemu
> > > > > > > > > make an intelligent decision whether to allow migration,
> > > > > > > > > or we trust the orchestrator. And if it's the latter, then
> 'migrate'
> > > > > > > > > already says orchestrator decided to migrate.
> > > > > > > > The problem I'm trying to solve is that most of vhost-user
> devices
> > > > > > > > now block migration in qemu. And this makes sense since qemu
> can't
> > > > > > > > extract and transfer backend daemon state. But this prevents
> us from
> > > > > > > > updating qemu executable via local migration. So this flag is
> > > > > > > > intended more as a safety check that says "I know what I'm
> doing".
> > > > > > > >
> > > > > > > > I agree that it is not really necessary if we trust the
> orchestrator
> > > > > > > > to request migration only when the migration can be
> performed in a
> > > > > > > > safe way. But changing the current behavior of vhost-user-fs
> from
> > > > > > > > "always blocks migration" to "migrates partial state whenever
> > > > > > > > orchestrator requests it" seems a little  dangerous and can
> be
> > > > > > > > misinterpreted as full support for migration in all cases.
> > > > > > > It's not really different from block is it? orchestrator has
> to arrange
> > > > > > > for backend migration. I think we just assumed there's no
> use-case where
> > > > > > > this is practical for vhost-user-fs so we blocked it.
> > > > > > > But in any case it's orchestrator's responsibility.
> > > > > >
> > > > > > Yes, you are right. So do you think we should just drop the
> blocker
> > > > > > without adding a new flag?
> > > > >
> > > > > I'd be inclined to. I am curious what do dgilbert and stefanha
> think though.
> > > >
> > > > Yes I think that's probably OK, as long as we use the flag for
> knowing
> > > > how to handle the discard bitmap as a proxy for the daemon knowing
> how
> > > > to handle *some* migrations; knowing which migrations is then the job
> > > > for the orchestrator to be careful of.
> > >
> > > I think the feature bit is not a good way to detect live migration
> > > support. vhost-user backends typically use libvhost-user, rust-vmm's
> > > vhost-user-backend crate, etc where this feature can be implemented for
> > > free. If the feature bit is advertized we don't know if the device
> > > implementation (net, blk, fs, etc) is aware of migration at all.
> >
> > I checked how bad the situation is. libvhost-user currently enables
> > LOG_ALL by default. :(
> >
> > So I don't think the front-end can use LOG_ALL alone to determine
> > whether or not migration is supported by the back-end.
> >
> > There are several existing back-ends based on libvhost-user that have
> > no concept of reconnection or migration but report the LOG_ALL feature
> > bit.
>
> Ouch, yes that's messy.
>
> Going back to the original question; I don't think a command line flag
> will work though, because even for a given VM there's the possibility
> of some (local) migrations working but other (remote) migrations not
> working; so you don't know at the point you start the VM whether
> your migrations are going to work.
>

The user or management tool should know which types of migration a
vhost-user-fs backend supports. That can be passed in as a per-device
parameter.

Then a migration parameter can be used to distinguish between same host and
remote host migration? QEMU already distinguishes between pre-copy and
post-copy migration, so this can be thought of as yet another type of
migration.

Stefan

>

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Juan Quintela 3 years ago

Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Tue, Jan 24, 2023, 04:50 Dr. David Alan Gilbert <dgilbert@redhat.com>
> wrote:

[...]

>> > I checked how bad the situation is. libvhost-user currently enables
>> > LOG_ALL by default. :(
>> >
>> > So I don't think the front-end can use LOG_ALL alone to determine
>> > whether or not migration is supported by the back-end.
>> >
>> > There are several existing back-ends based on libvhost-user that have
>> > no concept of reconnection or migration but report the LOG_ALL feature
>> > bit.
>>
>> Ouch, yes that's messy.
>>
>> Going back to the original question; I don't think a command line flag
>> will work though, because even for a given VM there's the possibility
>> of some (local) migrations working but other (remote) migrations not
>> working; so you don't know at the point you start the VM whether
>> your migrations are going to work.
>>
>
> The user or management tool should know which types of migration a
> vhost-user-fs backend supports. That can be passed in as a per-device
> parameter.
>
> Then a migration parameter can be used to distinguish between same host and
> remote host migration? QEMU already distinguishes between pre-copy and
> post-copy migration, so this can be thought of as yet another type of
> migration.

I was going to suggest this (my previous answer was after reading only
the other part of the comments).

What we have here is that this device has "three" states:
- You can't migrate it to other host (now and the default behaviour)
- You can migrate some of the backends if you are migrating in the same
  host (note, we don't know directly that we are migrating inside the
  same host, so I would agree to add _that_ migration capability, that
  is related to migration, and it makes sense for migration code and
  devices to know that is happening)
- In the future, perhaps, you can migrate "always" some vhost-use-fs,
  that would be a property on my opinion.

Later, Juan.

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Stefan Hajnoczi 3 years ago

On Sun, 22 Jan 2023 at 11:18, Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote:
> >
> > On 22/01/2023 16:46, Michael S. Tsirkin wrote:
> > > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote:
> > > > > > This flag should be set when qemu don't need to worry about any
> > > > > > external state stored in vhost-user daemons during migration:
> > > > > > don't fail migration, just pack generic virtio device states to
> > > > > > migration stream and orchestrator guarantees that the rest of the
> > > > > > state will be present at the destination to restore full context and
> > > > > > continue running.
> > > > > Sorry  I still do not get it.  So fundamentally, why do we need this property?
> > > > > vhost-user-fs is not created by default that we'd then need opt-in to
> > > > > the special "migrateable" case.
> > > > > That's why I said it might make some sense as a device property as qemu
> > > > > tracks whether device is unplugged for us.
> > > > >
> > > > > But as written, if you are going to teach the orchestrator about
> > > > > vhost-user-fs and its special needs, just teach it when to migrate and
> > > > > where not to migrate.
> > > > >
> > > > > Either we describe the special situation to qemu and let qemu
> > > > > make an intelligent decision whether to allow migration,
> > > > > or we trust the orchestrator. And if it's the latter, then 'migrate'
> > > > > already says orchestrator decided to migrate.
> > > > The problem I'm trying to solve is that most of vhost-user devices
> > > > now block migration in qemu. And this makes sense since qemu can't
> > > > extract and transfer backend daemon state. But this prevents us from
> > > > updating qemu executable via local migration. So this flag is
> > > > intended more as a safety check that says "I know what I'm doing".
> > > >
> > > > I agree that it is not really necessary if we trust the orchestrator
> > > > to request migration only when the migration can be performed in a
> > > > safe way. But changing the current behavior of vhost-user-fs from
> > > > "always blocks migration" to "migrates partial state whenever
> > > > orchestrator requests it" seems a little  dangerous and can be
> > > > misinterpreted as full support for migration in all cases.
> > > It's not really different from block is it? orchestrator has to arrange
> > > for backend migration. I think we just assumed there's no use-case where
> > > this is practical for vhost-user-fs so we blocked it.
> > > But in any case it's orchestrator's responsibility.
> >
> > Yes, you are right. So do you think we should just drop the blocker
> > without adding a new flag?
>
> I'd be inclined to. I am curious what do dgilbert and stefanha think though.

If the migration blocker is removed, what happens when a user attempts
to migrate with a management tool and/or vhost-user-fs server
implementation that don't support migration?

Anton: Can you explain how stateless migration will work on the
vhost-user-fs back-end side? Is it reusing vhost-user reconnect
functionality or introducing a new mode for stateless migration? I
guess the vhost-user-fs back-end implementation is required to
implement VHOST_F_LOG_ALL so dirty memory can be tracked and drain all
in-flight requests when vrings are stopped?

Stefan

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Anton Kuchin 3 years ago

On 23/01/2023 16:09, Stefan Hajnoczi wrote:
> On Sun, 22 Jan 2023 at 11:18, Michael S. Tsirkin <mst@redhat.com> wrote:
>> On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote:
>>> On 22/01/2023 16:46, Michael S. Tsirkin wrote:
>>>> On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote:
>>>>>>> This flag should be set when qemu don't need to worry about any
>>>>>>> external state stored in vhost-user daemons during migration:
>>>>>>> don't fail migration, just pack generic virtio device states to
>>>>>>> migration stream and orchestrator guarantees that the rest of the
>>>>>>> state will be present at the destination to restore full context and
>>>>>>> continue running.
>>>>>> Sorry  I still do not get it.  So fundamentally, why do we need this property?
>>>>>> vhost-user-fs is not created by default that we'd then need opt-in to
>>>>>> the special "migrateable" case.
>>>>>> That's why I said it might make some sense as a device property as qemu
>>>>>> tracks whether device is unplugged for us.
>>>>>>
>>>>>> But as written, if you are going to teach the orchestrator about
>>>>>> vhost-user-fs and its special needs, just teach it when to migrate and
>>>>>> where not to migrate.
>>>>>>
>>>>>> Either we describe the special situation to qemu and let qemu
>>>>>> make an intelligent decision whether to allow migration,
>>>>>> or we trust the orchestrator. And if it's the latter, then 'migrate'
>>>>>> already says orchestrator decided to migrate.
>>>>> The problem I'm trying to solve is that most of vhost-user devices
>>>>> now block migration in qemu. And this makes sense since qemu can't
>>>>> extract and transfer backend daemon state. But this prevents us from
>>>>> updating qemu executable via local migration. So this flag is
>>>>> intended more as a safety check that says "I know what I'm doing".
>>>>>
>>>>> I agree that it is not really necessary if we trust the orchestrator
>>>>> to request migration only when the migration can be performed in a
>>>>> safe way. But changing the current behavior of vhost-user-fs from
>>>>> "always blocks migration" to "migrates partial state whenever
>>>>> orchestrator requests it" seems a little  dangerous and can be
>>>>> misinterpreted as full support for migration in all cases.
>>>> It's not really different from block is it? orchestrator has to arrange
>>>> for backend migration. I think we just assumed there's no use-case where
>>>> this is practical for vhost-user-fs so we blocked it.
>>>> But in any case it's orchestrator's responsibility.
>>> Yes, you are right. So do you think we should just drop the blocker
>>> without adding a new flag?
>> I'd be inclined to. I am curious what do dgilbert and stefanha think though.
> If the migration blocker is removed, what happens when a user attempts
> to migrate with a management tool and/or vhost-user-fs server
> implementation that don't support migration?

There will be no matching fuse-session in destination endpoint so all
requests to this fs will fail until it is remounted from guest to
send new FUSE_INIT message that does session setup.

>
> Anton: Can you explain how stateless migration will work on the
> vhost-user-fs back-end side? Is it reusing vhost-user reconnect
> functionality or introducing a new mode for stateless migration? I
> guess the vhost-user-fs back-end implementation is required to
> implement VHOST_F_LOG_ALL so dirty memory can be tracked and drain all
> in-flight requests when vrings are stopped?

It reuses existing vhost-user reconnect code to resubmit inflight
requests.
Sure, backend needs to support this feature - presence of required
features is checked by generic vhost and vhost-user code during init
and if something is missing migration blocker is assigned to the
device (not a static one in vmstate that I remove in this patch, but
other per-device kind of blocker).

>
> Stefan

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Stefan Hajnoczi 3 years ago

On Mon, Jan 23, 2023 at 05:52:17PM +0200, Anton Kuchin wrote:
> 
> On 23/01/2023 16:09, Stefan Hajnoczi wrote:
> > On Sun, 22 Jan 2023 at 11:18, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote:
> > > > On 22/01/2023 16:46, Michael S. Tsirkin wrote:
> > > > > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote:
> > > > > > > > This flag should be set when qemu don't need to worry about any
> > > > > > > > external state stored in vhost-user daemons during migration:
> > > > > > > > don't fail migration, just pack generic virtio device states to
> > > > > > > > migration stream and orchestrator guarantees that the rest of the
> > > > > > > > state will be present at the destination to restore full context and
> > > > > > > > continue running.
> > > > > > > Sorry  I still do not get it.  So fundamentally, why do we need this property?
> > > > > > > vhost-user-fs is not created by default that we'd then need opt-in to
> > > > > > > the special "migrateable" case.
> > > > > > > That's why I said it might make some sense as a device property as qemu
> > > > > > > tracks whether device is unplugged for us.
> > > > > > > 
> > > > > > > But as written, if you are going to teach the orchestrator about
> > > > > > > vhost-user-fs and its special needs, just teach it when to migrate and
> > > > > > > where not to migrate.
> > > > > > > 
> > > > > > > Either we describe the special situation to qemu and let qemu
> > > > > > > make an intelligent decision whether to allow migration,
> > > > > > > or we trust the orchestrator. And if it's the latter, then 'migrate'
> > > > > > > already says orchestrator decided to migrate.
> > > > > > The problem I'm trying to solve is that most of vhost-user devices
> > > > > > now block migration in qemu. And this makes sense since qemu can't
> > > > > > extract and transfer backend daemon state. But this prevents us from
> > > > > > updating qemu executable via local migration. So this flag is
> > > > > > intended more as a safety check that says "I know what I'm doing".
> > > > > > 
> > > > > > I agree that it is not really necessary if we trust the orchestrator
> > > > > > to request migration only when the migration can be performed in a
> > > > > > safe way. But changing the current behavior of vhost-user-fs from
> > > > > > "always blocks migration" to "migrates partial state whenever
> > > > > > orchestrator requests it" seems a little  dangerous and can be
> > > > > > misinterpreted as full support for migration in all cases.
> > > > > It's not really different from block is it? orchestrator has to arrange
> > > > > for backend migration. I think we just assumed there's no use-case where
> > > > > this is practical for vhost-user-fs so we blocked it.
> > > > > But in any case it's orchestrator's responsibility.
> > > > Yes, you are right. So do you think we should just drop the blocker
> > > > without adding a new flag?
> > > I'd be inclined to. I am curious what do dgilbert and stefanha think though.
> > If the migration blocker is removed, what happens when a user attempts
> > to migrate with a management tool and/or vhost-user-fs server
> > implementation that don't support migration?
> 
> There will be no matching fuse-session in destination endpoint so all
> requests to this fs will fail until it is remounted from guest to
> send new FUSE_INIT message that does session setup.

The point of the migration blocker is to prevent breaking running
guests. Situations where a migration completes but results in a broken
guest are problematic for users (especially when they are not logged in
to guests and able to fix them interactively).

If a command-line option is set to override the blocker, that's fine.
But there needs to be a blocker by default if external knowledge is
required to decide whether or not it's safe to migrate.

> > 
> > Anton: Can you explain how stateless migration will work on the
> > vhost-user-fs back-end side? Is it reusing vhost-user reconnect
> > functionality or introducing a new mode for stateless migration? I
> > guess the vhost-user-fs back-end implementation is required to
> > implement VHOST_F_LOG_ALL so dirty memory can be tracked and drain all
> > in-flight requests when vrings are stopped?
> 
> It reuses existing vhost-user reconnect code to resubmit inflight
> requests.
> Sure, backend needs to support this feature - presence of required
> features is checked by generic vhost and vhost-user code during init
> and if something is missing migration blocker is assigned to the
> device (not a static one in vmstate that I remove in this patch, but
> other per-device kind of blocker).

This is not enough detail. Please post the QEMU patches before we commit
to a user-visible vhost-user-fs command-line parameter.

I think what you're trying is a new approach that can be made to work.
However, both vhost-user and migration are fragile and you have not
explained how it will work. I don't have confidence in merging this
incrementally because I'm afraid of committing to user-visible or
vhost-user protocol behavior that turns out to be broken just a little
while later.

The kind of detail I was hoping to hear was, for example, how
vhost_user_blk_device_realize() blocks and tries to reconnect 3 times.
Does this approach work for stateless migration? The destination QEMU is
launched before the source QEMU disconnects from the vhost-user UNIX
domain socket, so I guess the destination QEMU cannot connect in the
current version of vhost-user reconnect as implemented by QEMU's
vhost-user-blk device. Have you come up with a new handover protocol?

Stefan

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Michael S. Tsirkin 3 years ago

On Mon, Jan 23, 2023 at 02:49:39PM -0500, Stefan Hajnoczi wrote:
> The point of the migration blocker is to prevent breaking running
> guests. Situations where a migration completes but results in a broken
> guest are problematic for users (especially when they are not logged in
> to guests and able to fix them interactively).

I thought it's the reverse - we block when we are sure it can't work.
If we are not sure we should leave policy to orchestrators.

You can always get into situations like this with stateful storage (as
opposed to RO one).  For example, naively scp the backend file then
start migration. Will seem to work but corrupts the disk (I didn't try,
for sure with raw but what about qcow2?)

> If a command-line option is set to override the blocker, that's fine.
> But there needs to be a blocker by default if external knowledge is
> required to decide whether or not it's safe to migrate.

If all the command line says is "I want migration to work" then
that's more like shifting the blame than helping users.
They just learn this one weird trick and it seems to work
until it doesn't.  Then we are like "we told you only to set this
flag if you are sure" and they are like "well I was sure".

-- 
MST

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Stefan Hajnoczi 3 years ago

On Mon, Jan 23, 2023 at 04:00:50PM -0500, Michael S. Tsirkin wrote:
> On Mon, Jan 23, 2023 at 02:49:39PM -0500, Stefan Hajnoczi wrote:
> > The point of the migration blocker is to prevent breaking running
> > guests. Situations where a migration completes but results in a broken
> > guest are problematic for users (especially when they are not logged in
> > to guests and able to fix them interactively).
> 
> I thought it's the reverse - we block when we are sure it can't work.
> If we are not sure we should leave policy to orchestrators.
> 
> You can always get into situations like this with stateful storage (as
> opposed to RO one).  For example, naively scp the backend file then
> start migration. Will seem to work but corrupts the disk (I didn't try,
> for sure with raw but what about qcow2?)

You're right that ultimately QEMU cannot verify that the destination
will 100% work. Who knows if the destination is even a real QEMU or just
a process that throws away the migration stream? :)

> > If a command-line option is set to override the blocker, that's fine.
> > But there needs to be a blocker by default if external knowledge is
> > required to decide whether or not it's safe to migrate.
> 
> If all the command line says is "I want migration to work" then
> that's more like shifting the blame than helping users.
> They just learn this one weird trick and it seems to work
> until it doesn't.  Then we are like "we told you only to set this
> flag if you are sure" and they are like "well I was sure".

What I'm getting at is that this is a breaking change. Previously the
management tool didn't need to be aware of vhost-user-fs migration
support. QEMU would reject migrations. We cannot start allowing them
because management tools may depend on QEMU's migration blocker.

If management tools need to be aware now then the safe way to introduce
this is via a parameter that new management tools must explicitly pass
to QEMU.

Anton's same host migration case is valid but the majority of users
migrate between hosts and that case is not supported yet. Most of the
time vhost-user-fs migration won't work. Let's not break existing
management tools and vhost-user-fs back-ends.

Stefan

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Dr. David Alan Gilbert 3 years ago

* Anton Kuchin (antonkuchin@yandex-team.ru) wrote:
> On 19/01/2023 14:51, Michael S. Tsirkin wrote:
> > On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:
> > > Now any vhost-user-fs device makes VM unmigratable, that also prevents
> > > qemu update without stopping the VM. In most cases that makes sense
> > > because qemu has no way to transfer FUSE session state.
> > > 
> > > But we can give an option to orchestrator to override this if it can
> > > guarantee that state will be preserved (e.g. it uses migration to
> > > update qemu and dst will run on the same host as src and use the same
> > > socket endpoints).
> > > 
> > > This patch keeps default behavior that prevents migration with such devices
> > > but adds migration capability 'vhost-user-fs' to explicitly allow migration.
> > > 
> > > Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
> > > ---
> > >   hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
> > >   qapi/migration.json       |  7 ++++++-
> > >   2 files changed, 30 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> > > index f5049735ac..13d920423e 100644
> > > --- a/hw/virtio/vhost-user-fs.c
> > > +++ b/hw/virtio/vhost-user-fs.c
> > > @@ -24,6 +24,7 @@
> > >   #include "hw/virtio/vhost-user-fs.h"
> > >   #include "monitor/monitor.h"
> > >   #include "sysemu/sysemu.h"
> > > +#include "migration/migration.h"
> > >   static const int user_feature_bits[] = {
> > >       VIRTIO_F_VERSION_1,
> > > @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
> > >       return &fs->vhost_dev;
> > >   }
> > > +static int vhost_user_fs_pre_save(void *opaque)
> > > +{
> > > +    MigrationState *s = migrate_get_current();
> > > +
> > > +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
> > > +        error_report("Migration of vhost-user-fs devices requires internal FUSE "
> > > +                     "state of backend to be preserved. If orchestrator can "
> > > +                     "guarantee this (e.g. dst connects to the same backend "
> > > +                     "instance or backend state is migrated) set 'vhost-user-fs' "
> > > +                     "migration capability to true to enable migration.");
> > Isn't it possible that some backends are same and some are not?
> > Shouldn't this be a device property then?
> If some are not the same it is not guaranteed that correct FUSE
> state is present there, so orchestrator shouldn't set the capability
> because this can result in destination devices being broken (they'll
> be fine after the remount in guest, but this is guest visible and is
> not acceptable).

Shouldn't this be something that's negotiated as a feature bit in the
vhost-user protocol - i.e. to say whether the device can do migration?

However, I think what you're saying is that a migration might only be
doable in some cases - e.g. a local migration - and that's hard for
either qemu or the daemon to discover by itself; so it's right that
an orchestrator enables it.
(I'm not sure the error_report message needs to be quite so verbose -
normally a one line clear message is enough that people can then look
up).

> I can imagine smart orchestrator and backend that can transfer
> internal FUSE state, but we are not there yet, and this would be
> their responsibility then to ensure endpoint compatibility between src
> and dst and set the capability (that's why I put "e.g." and "or" in
> the error description).

You also need the vhost-user device to be able to do the dirty bitmap
updates; is that being checked somewhere?

Dave

> > 
> > 
> > 
> > > +        return -1;
> > > +    }
> > > +
> > > +    return 0;
> > > +}
> > > +
> > >   static const VMStateDescription vuf_vmstate = {
> > >       .name = "vhost-user-fs",
> > > -    .unmigratable = 1,
> > > +    .minimum_version_id = 0,
> > > +    .version_id = 0,
> > > +    .fields = (VMStateField[]) {
> > > +        VMSTATE_VIRTIO_DEVICE,
> > > +        VMSTATE_END_OF_LIST()
> > > +    },
> > > +   .pre_save = vhost_user_fs_pre_save,
> > >   };
> > >   static Property vuf_properties[] = {
> > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > index 88ecf86ac8..9a229ea884 100644
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -477,6 +477,11 @@
> > >   #                    will be handled faster.  This is a performance feature and
> > >   #                    should not affect the correctness of postcopy migration.
> > >   #                    (since 7.1)
> > > +# @vhost-user-fs: If enabled, the migration process will allow migration of
> > > +#                 vhost-user-fs devices, this should be enabled only when
> > > +#                 backend can preserve local FUSE state e.g. for qemu update
> > > +#                 when dst reconects to the same endpoints after migration.
> > > +#                 (since 8.0)
> > >   #
> > >   # Features:
> > >   # @unstable: Members @x-colo and @x-ignore-shared are experimental.
> > > @@ -492,7 +497,7 @@
> > >              'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
> > >              { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
> > >              'validate-uuid', 'background-snapshot',
> > > -           'zero-copy-send', 'postcopy-preempt'] }
> > > +           'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] }
> > I kind of dislike that it's such a specific flag. Is only vhost-user-fs
> > ever going to be affected? Any way to put it in a way that is more generic?
> Here I agree with you: I would prefer less narrow naming too. But I
> didn't manage to come up with one. Looks like many other vhost-user
> devices could benefit from this so maybe "vhost-user-stateless" or
> something like this would be better.
> I'm not sure that other types of devices could handle reconnect to
> the old endpoint as easy as vhost-user-fs, but anyway the support for
> this flag needs to be implemented for each device individually.
> What do you think? Any ideas would be appreciated.
> 
> > 
> > 
> > >   ##
> > >   # @MigrationCapabilityStatus:
> > > -- 
> > > 2.34.1
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Anton Kuchin 3 years ago

On 19/01/2023 21:00, Dr. David Alan Gilbert wrote:
> * Anton Kuchin (antonkuchin@yandex-team.ru) wrote:
>> On 19/01/2023 14:51, Michael S. Tsirkin wrote:
>>> On Sun, Jan 15, 2023 at 07:09:03PM +0200, Anton Kuchin wrote:
>>>> Now any vhost-user-fs device makes VM unmigratable, that also prevents
>>>> qemu update without stopping the VM. In most cases that makes sense
>>>> because qemu has no way to transfer FUSE session state.
>>>>
>>>> But we can give an option to orchestrator to override this if it can
>>>> guarantee that state will be preserved (e.g. it uses migration to
>>>> update qemu and dst will run on the same host as src and use the same
>>>> socket endpoints).
>>>>
>>>> This patch keeps default behavior that prevents migration with such devices
>>>> but adds migration capability 'vhost-user-fs' to explicitly allow migration.
>>>>
>>>> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
>>>> ---
>>>>    hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
>>>>    qapi/migration.json       |  7 ++++++-
>>>>    2 files changed, 30 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
>>>> index f5049735ac..13d920423e 100644
>>>> --- a/hw/virtio/vhost-user-fs.c
>>>> +++ b/hw/virtio/vhost-user-fs.c
>>>> @@ -24,6 +24,7 @@
>>>>    #include "hw/virtio/vhost-user-fs.h"
>>>>    #include "monitor/monitor.h"
>>>>    #include "sysemu/sysemu.h"
>>>> +#include "migration/migration.h"
>>>>    static const int user_feature_bits[] = {
>>>>        VIRTIO_F_VERSION_1,
>>>> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
>>>>        return &fs->vhost_dev;
>>>>    }
>>>> +static int vhost_user_fs_pre_save(void *opaque)
>>>> +{
>>>> +    MigrationState *s = migrate_get_current();
>>>> +
>>>> +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
>>>> +        error_report("Migration of vhost-user-fs devices requires internal FUSE "
>>>> +                     "state of backend to be preserved. If orchestrator can "
>>>> +                     "guarantee this (e.g. dst connects to the same backend "
>>>> +                     "instance or backend state is migrated) set 'vhost-user-fs' "
>>>> +                     "migration capability to true to enable migration.");
>>> Isn't it possible that some backends are same and some are not?
>>> Shouldn't this be a device property then?
>> If some are not the same it is not guaranteed that correct FUSE
>> state is present there, so orchestrator shouldn't set the capability
>> because this can result in destination devices being broken (they'll
>> be fine after the remount in guest, but this is guest visible and is
>> not acceptable).
> Shouldn't this be something that's negotiated as a feature bit in the
> vhost-user protocol - i.e. to say whether the device can do migration?
>
> However, I think what you're saying is that a migration might only be
> doable in some cases - e.g. a local migration - and that's hard for
> either qemu or the daemon to discover by itself; so it's right that
> an orchestrator enables it.
> (I'm not sure the error_report message needs to be quite so verbose -
> normally a one line clear message is enough that people can then look
> up).


Exactly, migration depends not only on device capabilities but also
on backends, hosts, management and other environment not known to
qemu or backend so I can't see how this can be device config or
negotiated via the protocol.

In the message I tried to make clear that just enabling the capability
without proper setup can be dangerous to reduce temptation to use it
recklessly and get broken virtiofs device later. I'll try to shorten
the message. I would appreciate if you could help me with the wording
(I'm not a native speaker so building a short sentence with proper
level of warning is challenging)

>
>> I can imagine smart orchestrator and backend that can transfer
>> internal FUSE state, but we are not there yet, and this would be
>> their responsibility then to ensure endpoint compatibility between src
>> and dst and set the capability (that's why I put "e.g." and "or" in
>> the error description).
> You also need the vhost-user device to be able to do the dirty bitmap
> updates; is that being checked somewhere?
>
> Dave

Yes, this is done by generic vhost and vhost-user code on device init.
In vhost_user_backend_init function if backend doesn't support
VHOST_USER_PROTOCOL_F_LOG_SHMFD protocol feature migration blocker
is assigned to device. And in vhost_dev_init there is one more check
for VHOST_F_LOG_ALL feature that assigns another migration blocker if
this is not supported.

>
>>>
>>>
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>>    static const VMStateDescription vuf_vmstate = {
>>>>        .name = "vhost-user-fs",
>>>> -    .unmigratable = 1,
>>>> +    .minimum_version_id = 0,
>>>> +    .version_id = 0,
>>>> +    .fields = (VMStateField[]) {
>>>> +        VMSTATE_VIRTIO_DEVICE,
>>>> +        VMSTATE_END_OF_LIST()
>>>> +    },
>>>> +   .pre_save = vhost_user_fs_pre_save,
>>>>    };
>>>>    static Property vuf_properties[] = {
>>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>>> index 88ecf86ac8..9a229ea884 100644
>>>> --- a/qapi/migration.json
>>>> +++ b/qapi/migration.json
>>>> @@ -477,6 +477,11 @@
>>>>    #                    will be handled faster.  This is a performance feature and
>>>>    #                    should not affect the correctness of postcopy migration.
>>>>    #                    (since 7.1)
>>>> +# @vhost-user-fs: If enabled, the migration process will allow migration of
>>>> +#                 vhost-user-fs devices, this should be enabled only when
>>>> +#                 backend can preserve local FUSE state e.g. for qemu update
>>>> +#                 when dst reconects to the same endpoints after migration.
>>>> +#                 (since 8.0)
>>>>    #
>>>>    # Features:
>>>>    # @unstable: Members @x-colo and @x-ignore-shared are experimental.
>>>> @@ -492,7 +497,7 @@
>>>>               'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
>>>>               { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>>>>               'validate-uuid', 'background-snapshot',
>>>> -           'zero-copy-send', 'postcopy-preempt'] }
>>>> +           'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] }
>>> I kind of dislike that it's such a specific flag. Is only vhost-user-fs
>>> ever going to be affected? Any way to put it in a way that is more generic?
>> Here I agree with you: I would prefer less narrow naming too. But I
>> didn't manage to come up with one. Looks like many other vhost-user
>> devices could benefit from this so maybe "vhost-user-stateless" or
>> something like this would be better.
>> I'm not sure that other types of devices could handle reconnect to
>> the old endpoint as easy as vhost-user-fs, but anyway the support for
>> this flag needs to be implemented for each device individually.
>> What do you think? Any ideas would be appreciated.
>>
>>>
>>>>    ##
>>>>    # @MigrationCapabilityStatus:
>>>> -- 
>>>> 2.34.1

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Stefan Hajnoczi 3 years ago

On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>
> Now any vhost-user-fs device makes VM unmigratable, that also prevents
> qemu update without stopping the VM. In most cases that makes sense
> because qemu has no way to transfer FUSE session state.
>
> But we can give an option to orchestrator to override this if it can
> guarantee that state will be preserved (e.g. it uses migration to
> update qemu and dst will run on the same host as src and use the same
> socket endpoints).
>
> This patch keeps default behavior that prevents migration with such devices
> but adds migration capability 'vhost-user-fs' to explicitly allow migration.
>
> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
> ---
>  hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
>  qapi/migration.json       |  7 ++++++-
>  2 files changed, 30 insertions(+), 2 deletions(-)
>
> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> index f5049735ac..13d920423e 100644
> --- a/hw/virtio/vhost-user-fs.c
> +++ b/hw/virtio/vhost-user-fs.c
> @@ -24,6 +24,7 @@
>  #include "hw/virtio/vhost-user-fs.h"
>  #include "monitor/monitor.h"
>  #include "sysemu/sysemu.h"
> +#include "migration/migration.h"
>
>  static const int user_feature_bits[] = {
>      VIRTIO_F_VERSION_1,
> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
>      return &fs->vhost_dev;
>  }
>
> +static int vhost_user_fs_pre_save(void *opaque)
> +{
> +    MigrationState *s = migrate_get_current();
> +
> +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
> +        error_report("Migration of vhost-user-fs devices requires internal FUSE "
> +                     "state of backend to be preserved. If orchestrator can "
> +                     "guarantee this (e.g. dst connects to the same backend "
> +                     "instance or backend state is migrated) set 'vhost-user-fs' "
> +                     "migration capability to true to enable migration.");
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
>  static const VMStateDescription vuf_vmstate = {
>      .name = "vhost-user-fs",
> -    .unmigratable = 1,
> +    .minimum_version_id = 0,
> +    .version_id = 0,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_VIRTIO_DEVICE,
> +        VMSTATE_END_OF_LIST()
> +    },
> +   .pre_save = vhost_user_fs_pre_save,
>  };

Will it be possible to extend this vmstate when virtiofsd adds support
for stateful migration without breaking migration compatibility?

If not, then I think a marker field should be added to the vmstate:
0 - stateless/reconnect migration (the approach you're adding in this patch)
1 - stateful migration (future virtiofsd feature)

When the field is 0 there are no further vmstate fields and we trust
that the destination vhost-user-fs server already has the necessary
state.

When the field is 1 there are additional vmstate fields that contain
the virtiofsd state.

The goal is for QEMU to support 3 migration modes, depending on the
vhost-user-fs server:
1. No migration support.
2. Stateless migration.
3. Stateful migration.

>
>  static Property vuf_properties[] = {
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 88ecf86ac8..9a229ea884 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -477,6 +477,11 @@
>  #                    will be handled faster.  This is a performance feature and
>  #                    should not affect the correctness of postcopy migration.
>  #                    (since 7.1)
> +# @vhost-user-fs: If enabled, the migration process will allow migration of
> +#                 vhost-user-fs devices, this should be enabled only when
> +#                 backend can preserve local FUSE state e.g. for qemu update
> +#                 when dst reconects to the same endpoints after migration.
> +#                 (since 8.0)

This is global but a guest can have multiple vhost-user-fs devices
connected to different servers.

I would add a qdev property to the device instead of introducing a
migration capability. The property would enable "stateless migration".
When the property is not set, migration would be prohibited.

>  #
>  # Features:
>  # @unstable: Members @x-colo and @x-ignore-shared are experimental.
> @@ -492,7 +497,7 @@
>             'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
>             { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>             'validate-uuid', 'background-snapshot',
> -           'zero-copy-send', 'postcopy-preempt'] }
> +           'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] }
>
>  ##
>  # @MigrationCapabilityStatus:
> --
> 2.34.1
>
>

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Anton Kuchin 3 years ago

On 18/01/2023 17:52, Stefan Hajnoczi wrote:
> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>> Now any vhost-user-fs device makes VM unmigratable, that also prevents
>> qemu update without stopping the VM. In most cases that makes sense
>> because qemu has no way to transfer FUSE session state.
>>
>> But we can give an option to orchestrator to override this if it can
>> guarantee that state will be preserved (e.g. it uses migration to
>> update qemu and dst will run on the same host as src and use the same
>> socket endpoints).
>>
>> This patch keeps default behavior that prevents migration with such devices
>> but adds migration capability 'vhost-user-fs' to explicitly allow migration.
>>
>> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
>> ---
>>   hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
>>   qapi/migration.json       |  7 ++++++-
>>   2 files changed, 30 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
>> index f5049735ac..13d920423e 100644
>> --- a/hw/virtio/vhost-user-fs.c
>> +++ b/hw/virtio/vhost-user-fs.c
>> @@ -24,6 +24,7 @@
>>   #include "hw/virtio/vhost-user-fs.h"
>>   #include "monitor/monitor.h"
>>   #include "sysemu/sysemu.h"
>> +#include "migration/migration.h"
>>
>>   static const int user_feature_bits[] = {
>>       VIRTIO_F_VERSION_1,
>> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
>>       return &fs->vhost_dev;
>>   }
>>
>> +static int vhost_user_fs_pre_save(void *opaque)
>> +{
>> +    MigrationState *s = migrate_get_current();
>> +
>> +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
>> +        error_report("Migration of vhost-user-fs devices requires internal FUSE "
>> +                     "state of backend to be preserved. If orchestrator can "
>> +                     "guarantee this (e.g. dst connects to the same backend "
>> +                     "instance or backend state is migrated) set 'vhost-user-fs' "
>> +                     "migration capability to true to enable migration.");
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>>   static const VMStateDescription vuf_vmstate = {
>>       .name = "vhost-user-fs",
>> -    .unmigratable = 1,
>> +    .minimum_version_id = 0,
>> +    .version_id = 0,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_VIRTIO_DEVICE,
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +   .pre_save = vhost_user_fs_pre_save,
>>   };
> Will it be possible to extend this vmstate when virtiofsd adds support
> for stateful migration without breaking migration compatibility?
>
> If not, then I think a marker field should be added to the vmstate:
> 0 - stateless/reconnect migration (the approach you're adding in this patch)
> 1 - stateful migration (future virtiofsd feature)
>
> When the field is 0 there are no further vmstate fields and we trust
> that the destination vhost-user-fs server already has the necessary
> state.
>
> When the field is 1 there are additional vmstate fields that contain
> the virtiofsd state.
>
> The goal is for QEMU to support 3 migration modes, depending on the
> vhost-user-fs server:
> 1. No migration support.
> 2. Stateless migration.
> 3. Stateful migration.

Sure. These vmstate fields are very generic and mandatory for any
virtio device. If in future more state can be transfer in migration
stream the vmstate can be extended with additional fields. This can
be done with new subsections and/or bumping version_id.

The main purpose of this patch is to allow update VM to newer version
of qemu via local migration without disruption to guest. And future
versions hopefully could pack more state from external environment
to migration stream.


>
>>   static Property vuf_properties[] = {
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 88ecf86ac8..9a229ea884 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -477,6 +477,11 @@
>>   #                    will be handled faster.  This is a performance feature and
>>   #                    should not affect the correctness of postcopy migration.
>>   #                    (since 7.1)
>> +# @vhost-user-fs: If enabled, the migration process will allow migration of
>> +#                 vhost-user-fs devices, this should be enabled only when
>> +#                 backend can preserve local FUSE state e.g. for qemu update
>> +#                 when dst reconects to the same endpoints after migration.
>> +#                 (since 8.0)
> This is global but a guest can have multiple vhost-user-fs devices
> connected to different servers.
AFAIK vhost-user requires unix socket and memory shared from guest so
devices can't be connected to different servers, just to different
endpoints on current host.

>
> I would add a qdev property to the device instead of introducing a
> migration capability. The property would enable "stateless migration".
> When the property is not set, migration would be prohibited.
I did thought about that, but this is really not a property of device,
this is the capability of management software and applies to exactly one
particular migration process that it initiates. It should not persist
across migration or be otherwise stored in device.

The idea here is that orchestrator can ensure destination qemu will
run on the same host, will reconnect to the same unix sockets and only
then sets the flag (because inside qemu we can't know anything about
the destination).
This is somewhat similar to ignore-shared migration capability when
qemu avoids saving and loading guest memory that is stores in shmem
because it will be picked up by destination process right where source
left it.

>>   #
>>   # Features:
>>   # @unstable: Members @x-colo and @x-ignore-shared are experimental.
>> @@ -492,7 +497,7 @@
>>              'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
>>              { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>>              'validate-uuid', 'background-snapshot',
>> -           'zero-copy-send', 'postcopy-preempt'] }
>> +           'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] }
>>
>>   ##
>>   # @MigrationCapabilityStatus:
>> --
>> 2.34.1
>>
>>

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Stefan Hajnoczi 3 years ago

On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>
> On 18/01/2023 17:52, Stefan Hajnoczi wrote:
> > On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
> >> Now any vhost-user-fs device makes VM unmigratable, that also prevents
> >> qemu update without stopping the VM. In most cases that makes sense
> >> because qemu has no way to transfer FUSE session state.
> >>
> >> But we can give an option to orchestrator to override this if it can
> >> guarantee that state will be preserved (e.g. it uses migration to
> >> update qemu and dst will run on the same host as src and use the same
> >> socket endpoints).
> >>
> >> This patch keeps default behavior that prevents migration with such devices
> >> but adds migration capability 'vhost-user-fs' to explicitly allow migration.
> >>
> >> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
> >> ---
> >>   hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
> >>   qapi/migration.json       |  7 ++++++-
> >>   2 files changed, 30 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> >> index f5049735ac..13d920423e 100644
> >> --- a/hw/virtio/vhost-user-fs.c
> >> +++ b/hw/virtio/vhost-user-fs.c
> >> @@ -24,6 +24,7 @@
> >>   #include "hw/virtio/vhost-user-fs.h"
> >>   #include "monitor/monitor.h"
> >>   #include "sysemu/sysemu.h"
> >> +#include "migration/migration.h"
> >>
> >>   static const int user_feature_bits[] = {
> >>       VIRTIO_F_VERSION_1,
> >> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
> >>       return &fs->vhost_dev;
> >>   }
> >>
> >> +static int vhost_user_fs_pre_save(void *opaque)
> >> +{
> >> +    MigrationState *s = migrate_get_current();
> >> +
> >> +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
> >> +        error_report("Migration of vhost-user-fs devices requires internal FUSE "
> >> +                     "state of backend to be preserved. If orchestrator can "
> >> +                     "guarantee this (e.g. dst connects to the same backend "
> >> +                     "instance or backend state is migrated) set 'vhost-user-fs' "
> >> +                     "migration capability to true to enable migration.");
> >> +        return -1;
> >> +    }
> >> +
> >> +    return 0;
> >> +}
> >> +
> >>   static const VMStateDescription vuf_vmstate = {
> >>       .name = "vhost-user-fs",
> >> -    .unmigratable = 1,
> >> +    .minimum_version_id = 0,
> >> +    .version_id = 0,
> >> +    .fields = (VMStateField[]) {
> >> +        VMSTATE_VIRTIO_DEVICE,
> >> +        VMSTATE_END_OF_LIST()
> >> +    },
> >> +   .pre_save = vhost_user_fs_pre_save,
> >>   };
> > Will it be possible to extend this vmstate when virtiofsd adds support
> > for stateful migration without breaking migration compatibility?
> >
> > If not, then I think a marker field should be added to the vmstate:
> > 0 - stateless/reconnect migration (the approach you're adding in this patch)
> > 1 - stateful migration (future virtiofsd feature)
> >
> > When the field is 0 there are no further vmstate fields and we trust
> > that the destination vhost-user-fs server already has the necessary
> > state.
> >
> > When the field is 1 there are additional vmstate fields that contain
> > the virtiofsd state.
> >
> > The goal is for QEMU to support 3 migration modes, depending on the
> > vhost-user-fs server:
> > 1. No migration support.
> > 2. Stateless migration.
> > 3. Stateful migration.
>
> Sure. These vmstate fields are very generic and mandatory for any
> virtio device. If in future more state can be transfer in migration
> stream the vmstate can be extended with additional fields. This can
> be done with new subsections and/or bumping version_id.

My concern is that the vmstate introduced in this patch may be
unusable when stateful migration is added. So additional compatibility
code will need to be introduced to make your stateless migration
continue working with extended vmstate.

By adding a marker field in this patch it should be possible to
continue using the same vmstate for stateless migration without adding
extra compatibility code in the future.

> The main purpose of this patch is to allow update VM to newer version
> of qemu via local migration without disruption to guest. And future
> versions hopefully could pack more state from external environment
> to migration stream.
>
>
> >
> >>   static Property vuf_properties[] = {
> >> diff --git a/qapi/migration.json b/qapi/migration.json
> >> index 88ecf86ac8..9a229ea884 100644
> >> --- a/qapi/migration.json
> >> +++ b/qapi/migration.json
> >> @@ -477,6 +477,11 @@
> >>   #                    will be handled faster.  This is a performance feature and
> >>   #                    should not affect the correctness of postcopy migration.
> >>   #                    (since 7.1)
> >> +# @vhost-user-fs: If enabled, the migration process will allow migration of
> >> +#                 vhost-user-fs devices, this should be enabled only when
> >> +#                 backend can preserve local FUSE state e.g. for qemu update
> >> +#                 when dst reconects to the same endpoints after migration.
> >> +#                 (since 8.0)
> > This is global but a guest can have multiple vhost-user-fs devices
> > connected to different servers.
> AFAIK vhost-user requires unix socket and memory shared from guest so
> devices can't be connected to different servers, just to different
> endpoints on current host.

Different vhost-user-fs server software. vhost-user-fs is a protocol,
there can be multiple server implementations. These implementations
may have different capabilities (some may not support stateless
migration). So I think stateless migration should be a per-instance
setting, not global.

>
> >
> > I would add a qdev property to the device instead of introducing a
> > migration capability. The property would enable "stateless migration".
> > When the property is not set, migration would be prohibited.
> I did thought about that, but this is really not a property of device,
> this is the capability of management software and applies to exactly one
> particular migration process that it initiates. It should not persist
> across migration or be otherwise stored in device.

I disagree. The vhost-user-fs server software must implement stateless
migration in order for this to work. For example,
https://gitlab.com/virtio-fs/virtiofsd doesn't support stateless
migration as far as I know.

>
> The idea here is that orchestrator can ensure destination qemu will
> run on the same host, will reconnect to the same unix sockets and only
> then sets the flag (because inside qemu we can't know anything about
> the destination).
> This is somewhat similar to ignore-shared migration capability when
> qemu avoids saving and loading guest memory that is stores in shmem
> because it will be picked up by destination process right where source
> left it.
>
> >>   #
> >>   # Features:
> >>   # @unstable: Members @x-colo and @x-ignore-shared are experimental.
> >> @@ -492,7 +497,7 @@
> >>              'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
> >>              { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
> >>              'validate-uuid', 'background-snapshot',
> >> -           'zero-copy-send', 'postcopy-preempt'] }
> >> +           'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] }
> >>
> >>   ##
> >>   # @MigrationCapabilityStatus:
> >> --
> >> 2.34.1
> >>
> >>

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Anton Kuchin 3 years ago

On 19/01/2023 16:30, Stefan Hajnoczi wrote:
> On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>> On 18/01/2023 17:52, Stefan Hajnoczi wrote:
>>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>>>> Now any vhost-user-fs device makes VM unmigratable, that also prevents
>>>> qemu update without stopping the VM. In most cases that makes sense
>>>> because qemu has no way to transfer FUSE session state.
>>>>
>>>> But we can give an option to orchestrator to override this if it can
>>>> guarantee that state will be preserved (e.g. it uses migration to
>>>> update qemu and dst will run on the same host as src and use the same
>>>> socket endpoints).
>>>>
>>>> This patch keeps default behavior that prevents migration with such devices
>>>> but adds migration capability 'vhost-user-fs' to explicitly allow migration.
>>>>
>>>> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
>>>> ---
>>>>    hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
>>>>    qapi/migration.json       |  7 ++++++-
>>>>    2 files changed, 30 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
>>>> index f5049735ac..13d920423e 100644
>>>> --- a/hw/virtio/vhost-user-fs.c
>>>> +++ b/hw/virtio/vhost-user-fs.c
>>>> @@ -24,6 +24,7 @@
>>>>    #include "hw/virtio/vhost-user-fs.h"
>>>>    #include "monitor/monitor.h"
>>>>    #include "sysemu/sysemu.h"
>>>> +#include "migration/migration.h"
>>>>
>>>>    static const int user_feature_bits[] = {
>>>>        VIRTIO_F_VERSION_1,
>>>> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
>>>>        return &fs->vhost_dev;
>>>>    }
>>>>
>>>> +static int vhost_user_fs_pre_save(void *opaque)
>>>> +{
>>>> +    MigrationState *s = migrate_get_current();
>>>> +
>>>> +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
>>>> +        error_report("Migration of vhost-user-fs devices requires internal FUSE "
>>>> +                     "state of backend to be preserved. If orchestrator can "
>>>> +                     "guarantee this (e.g. dst connects to the same backend "
>>>> +                     "instance or backend state is migrated) set 'vhost-user-fs' "
>>>> +                     "migration capability to true to enable migration.");
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>>    static const VMStateDescription vuf_vmstate = {
>>>>        .name = "vhost-user-fs",
>>>> -    .unmigratable = 1,
>>>> +    .minimum_version_id = 0,
>>>> +    .version_id = 0,
>>>> +    .fields = (VMStateField[]) {
>>>> +        VMSTATE_VIRTIO_DEVICE,
>>>> +        VMSTATE_END_OF_LIST()
>>>> +    },
>>>> +   .pre_save = vhost_user_fs_pre_save,
>>>>    };
>>> Will it be possible to extend this vmstate when virtiofsd adds support
>>> for stateful migration without breaking migration compatibility?
>>>
>>> If not, then I think a marker field should be added to the vmstate:
>>> 0 - stateless/reconnect migration (the approach you're adding in this patch)
>>> 1 - stateful migration (future virtiofsd feature)
>>>
>>> When the field is 0 there are no further vmstate fields and we trust
>>> that the destination vhost-user-fs server already has the necessary
>>> state.
>>>
>>> When the field is 1 there are additional vmstate fields that contain
>>> the virtiofsd state.
>>>
>>> The goal is for QEMU to support 3 migration modes, depending on the
>>> vhost-user-fs server:
>>> 1. No migration support.
>>> 2. Stateless migration.
>>> 3. Stateful migration.
>> Sure. These vmstate fields are very generic and mandatory for any
>> virtio device. If in future more state can be transfer in migration
>> stream the vmstate can be extended with additional fields. This can
>> be done with new subsections and/or bumping version_id.
> My concern is that the vmstate introduced in this patch may be
> unusable when stateful migration is added. So additional compatibility
> code will need to be introduced to make your stateless migration
> continue working with extended vmstate.
>
> By adding a marker field in this patch it should be possible to
> continue using the same vmstate for stateless migration without adding
> extra compatibility code in the future.
I understand, but this fields in vmstate just packs generic virtio
device state that is accessible by qemu. All additional data could be
added later by extra fields. I believe we couldn't pull off any type
of virtio device migration without transferring virtqueues so more
sophisticated types of migration would require adding more data and
not modification to this part of vmstate.

>
>> The main purpose of this patch is to allow update VM to newer version
>> of qemu via local migration without disruption to guest. And future
>> versions hopefully could pack more state from external environment
>> to migration stream.
>>
>>
>>>>    static Property vuf_properties[] = {
>>>> diff --git a/qapi/migration.json b/qapi/migration.json
>>>> index 88ecf86ac8..9a229ea884 100644
>>>> --- a/qapi/migration.json
>>>> +++ b/qapi/migration.json
>>>> @@ -477,6 +477,11 @@
>>>>    #                    will be handled faster.  This is a performance feature and
>>>>    #                    should not affect the correctness of postcopy migration.
>>>>    #                    (since 7.1)
>>>> +# @vhost-user-fs: If enabled, the migration process will allow migration of
>>>> +#                 vhost-user-fs devices, this should be enabled only when
>>>> +#                 backend can preserve local FUSE state e.g. for qemu update
>>>> +#                 when dst reconects to the same endpoints after migration.
>>>> +#                 (since 8.0)
>>> This is global but a guest can have multiple vhost-user-fs devices
>>> connected to different servers.
>> AFAIK vhost-user requires unix socket and memory shared from guest so
>> devices can't be connected to different servers, just to different
>> endpoints on current host.
> Different vhost-user-fs server software. vhost-user-fs is a protocol,
> there can be multiple server implementations. These implementations
> may have different capabilities (some may not support stateless
> migration). So I think stateless migration should be a per-instance
> setting, not global.

 From qemu point of view we have no way of knowing which endpoints
implement stateless migration and if they can perform it at the
moment or need some additional setup to handle it, so after all the
final decision to migrate statelessly originates from the orchestrator
that knows backends states better.
Setting per-device flag can be more confusing because qemu can't know
or control backends states and capabilities. So I'd rather hand off
this task completely to the orchestrator and not try to split the
responsibility between it and qemu.

>
>>> I would add a qdev property to the device instead of introducing a
>>> migration capability. The property would enable "stateless migration".
>>> When the property is not set, migration would be prohibited.
>> I did thought about that, but this is really not a property of device,
>> this is the capability of management software and applies to exactly one
>> particular migration process that it initiates. It should not persist
>> across migration or be otherwise stored in device.
> I disagree. The vhost-user-fs server software must implement stateless
> migration in order for this to work. For example,
> https://gitlab.com/virtio-fs/virtiofsd doesn't support stateless
> migration as far as I know.
Yes the implementation in backend is necessary, but this is backend
property and is not related at all to device in qemu. So management
software should know better what backends are capable of.
In fact I'm working now on support for this in rust viftiofsd (my rust
skills are not good enough yet do this quickly) and plan to propose
spec patch for exposing "reconnect" vhost user backend capability to
have standard way for orchestrators to check for it.

>
>> The idea here is that orchestrator can ensure destination qemu will
>> run on the same host, will reconnect to the same unix sockets and only
>> then sets the flag (because inside qemu we can't know anything about
>> the destination).
>> This is somewhat similar to ignore-shared migration capability when
>> qemu avoids saving and loading guest memory that is stores in shmem
>> because it will be picked up by destination process right where source
>> left it.
>>
>>>>    #
>>>>    # Features:
>>>>    # @unstable: Members @x-colo and @x-ignore-shared are experimental.
>>>> @@ -492,7 +497,7 @@
>>>>               'dirty-bitmaps', 'postcopy-blocktime', 'late-block-activate',
>>>>               { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>>>>               'validate-uuid', 'background-snapshot',
>>>> -           'zero-copy-send', 'postcopy-preempt'] }
>>>> +           'zero-copy-send', 'postcopy-preempt', 'vhost-user-fs'] }
>>>>
>>>>    ##
>>>>    # @MigrationCapabilityStatus:
>>>> --
>>>> 2.34.1
>>>>
>>>>

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Stefan Hajnoczi 3 years ago

On Thu, 19 Jan 2023 at 10:29, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>
> On 19/01/2023 16:30, Stefan Hajnoczi wrote:
> > On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
> >> On 18/01/2023 17:52, Stefan Hajnoczi wrote:
> >>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
> >>>> Now any vhost-user-fs device makes VM unmigratable, that also prevents
> >>>> qemu update without stopping the VM. In most cases that makes sense
> >>>> because qemu has no way to transfer FUSE session state.
> >>>>
> >>>> But we can give an option to orchestrator to override this if it can
> >>>> guarantee that state will be preserved (e.g. it uses migration to
> >>>> update qemu and dst will run on the same host as src and use the same
> >>>> socket endpoints).
> >>>>
> >>>> This patch keeps default behavior that prevents migration with such devices
> >>>> but adds migration capability 'vhost-user-fs' to explicitly allow migration.
> >>>>
> >>>> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
> >>>> ---
> >>>>    hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
> >>>>    qapi/migration.json       |  7 ++++++-
> >>>>    2 files changed, 30 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> >>>> index f5049735ac..13d920423e 100644
> >>>> --- a/hw/virtio/vhost-user-fs.c
> >>>> +++ b/hw/virtio/vhost-user-fs.c
> >>>> @@ -24,6 +24,7 @@
> >>>>    #include "hw/virtio/vhost-user-fs.h"
> >>>>    #include "monitor/monitor.h"
> >>>>    #include "sysemu/sysemu.h"
> >>>> +#include "migration/migration.h"
> >>>>
> >>>>    static const int user_feature_bits[] = {
> >>>>        VIRTIO_F_VERSION_1,
> >>>> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
> >>>>        return &fs->vhost_dev;
> >>>>    }
> >>>>
> >>>> +static int vhost_user_fs_pre_save(void *opaque)
> >>>> +{
> >>>> +    MigrationState *s = migrate_get_current();
> >>>> +
> >>>> +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
> >>>> +        error_report("Migration of vhost-user-fs devices requires internal FUSE "
> >>>> +                     "state of backend to be preserved. If orchestrator can "
> >>>> +                     "guarantee this (e.g. dst connects to the same backend "
> >>>> +                     "instance or backend state is migrated) set 'vhost-user-fs' "
> >>>> +                     "migration capability to true to enable migration.");
> >>>> +        return -1;
> >>>> +    }
> >>>> +
> >>>> +    return 0;
> >>>> +}
> >>>> +
> >>>>    static const VMStateDescription vuf_vmstate = {
> >>>>        .name = "vhost-user-fs",
> >>>> -    .unmigratable = 1,
> >>>> +    .minimum_version_id = 0,
> >>>> +    .version_id = 0,
> >>>> +    .fields = (VMStateField[]) {
> >>>> +        VMSTATE_VIRTIO_DEVICE,
> >>>> +        VMSTATE_END_OF_LIST()
> >>>> +    },
> >>>> +   .pre_save = vhost_user_fs_pre_save,
> >>>>    };
> >>> Will it be possible to extend this vmstate when virtiofsd adds support
> >>> for stateful migration without breaking migration compatibility?
> >>>
> >>> If not, then I think a marker field should be added to the vmstate:
> >>> 0 - stateless/reconnect migration (the approach you're adding in this patch)
> >>> 1 - stateful migration (future virtiofsd feature)
> >>>
> >>> When the field is 0 there are no further vmstate fields and we trust
> >>> that the destination vhost-user-fs server already has the necessary
> >>> state.
> >>>
> >>> When the field is 1 there are additional vmstate fields that contain
> >>> the virtiofsd state.
> >>>
> >>> The goal is for QEMU to support 3 migration modes, depending on the
> >>> vhost-user-fs server:
> >>> 1. No migration support.
> >>> 2. Stateless migration.
> >>> 3. Stateful migration.
> >> Sure. These vmstate fields are very generic and mandatory for any
> >> virtio device. If in future more state can be transfer in migration
> >> stream the vmstate can be extended with additional fields. This can
> >> be done with new subsections and/or bumping version_id.
> > My concern is that the vmstate introduced in this patch may be
> > unusable when stateful migration is added. So additional compatibility
> > code will need to be introduced to make your stateless migration
> > continue working with extended vmstate.
> >
> > By adding a marker field in this patch it should be possible to
> > continue using the same vmstate for stateless migration without adding
> > extra compatibility code in the future.
> I understand, but this fields in vmstate just packs generic virtio
> device state that is accessible by qemu. All additional data could be
> added later by extra fields. I believe we couldn't pull off any type
> of virtio device migration without transferring virtqueues so more
> sophisticated types of migration would require adding more data and
> not modification to this part of vmstate.

What I'm saying is that your patch could define the vmstate such that
it that contains a field to differentiate between stateless and
stateful migration. That way QEMU versions that only support stateless
migration (this patch) will be able to migrate to future QEMU versions
that support both stateless and stateful without compatibility issues.

I'm not sure if my suggestion to add a marker field to vuf_vmstate is
the best way to do this, but have you thought of how to handle the
future addition of stateful migration to the vmstate without breaking
stateless vmstates? Maybe David Gilbert has a suggestion for how to do
this cleanly.

Stefan

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Anton Kuchin 3 years ago

On 19/01/2023 18:02, Stefan Hajnoczi wrote:
> On Thu, 19 Jan 2023 at 10:29, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>> On 19/01/2023 16:30, Stefan Hajnoczi wrote:
>>> On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>>>> On 18/01/2023 17:52, Stefan Hajnoczi wrote:
>>>>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>>>>>> Now any vhost-user-fs device makes VM unmigratable, that also prevents
>>>>>> qemu update without stopping the VM. In most cases that makes sense
>>>>>> because qemu has no way to transfer FUSE session state.
>>>>>>
>>>>>> But we can give an option to orchestrator to override this if it can
>>>>>> guarantee that state will be preserved (e.g. it uses migration to
>>>>>> update qemu and dst will run on the same host as src and use the same
>>>>>> socket endpoints).
>>>>>>
>>>>>> This patch keeps default behavior that prevents migration with such devices
>>>>>> but adds migration capability 'vhost-user-fs' to explicitly allow migration.
>>>>>>
>>>>>> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
>>>>>> ---
>>>>>>     hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
>>>>>>     qapi/migration.json       |  7 ++++++-
>>>>>>     2 files changed, 30 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
>>>>>> index f5049735ac..13d920423e 100644
>>>>>> --- a/hw/virtio/vhost-user-fs.c
>>>>>> +++ b/hw/virtio/vhost-user-fs.c
>>>>>> @@ -24,6 +24,7 @@
>>>>>>     #include "hw/virtio/vhost-user-fs.h"
>>>>>>     #include "monitor/monitor.h"
>>>>>>     #include "sysemu/sysemu.h"
>>>>>> +#include "migration/migration.h"
>>>>>>
>>>>>>     static const int user_feature_bits[] = {
>>>>>>         VIRTIO_F_VERSION_1,
>>>>>> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
>>>>>>         return &fs->vhost_dev;
>>>>>>     }
>>>>>>
>>>>>> +static int vhost_user_fs_pre_save(void *opaque)
>>>>>> +{
>>>>>> +    MigrationState *s = migrate_get_current();
>>>>>> +
>>>>>> +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
>>>>>> +        error_report("Migration of vhost-user-fs devices requires internal FUSE "
>>>>>> +                     "state of backend to be preserved. If orchestrator can "
>>>>>> +                     "guarantee this (e.g. dst connects to the same backend "
>>>>>> +                     "instance or backend state is migrated) set 'vhost-user-fs' "
>>>>>> +                     "migration capability to true to enable migration.");
>>>>>> +        return -1;
>>>>>> +    }
>>>>>> +
>>>>>> +    return 0;
>>>>>> +}
>>>>>> +
>>>>>>     static const VMStateDescription vuf_vmstate = {
>>>>>>         .name = "vhost-user-fs",
>>>>>> -    .unmigratable = 1,
>>>>>> +    .minimum_version_id = 0,
>>>>>> +    .version_id = 0,
>>>>>> +    .fields = (VMStateField[]) {
>>>>>> +        VMSTATE_VIRTIO_DEVICE,
>>>>>> +        VMSTATE_END_OF_LIST()
>>>>>> +    },
>>>>>> +   .pre_save = vhost_user_fs_pre_save,
>>>>>>     };
>>>>> Will it be possible to extend this vmstate when virtiofsd adds support
>>>>> for stateful migration without breaking migration compatibility?
>>>>>
>>>>> If not, then I think a marker field should be added to the vmstate:
>>>>> 0 - stateless/reconnect migration (the approach you're adding in this patch)
>>>>> 1 - stateful migration (future virtiofsd feature)
>>>>>
>>>>> When the field is 0 there are no further vmstate fields and we trust
>>>>> that the destination vhost-user-fs server already has the necessary
>>>>> state.
>>>>>
>>>>> When the field is 1 there are additional vmstate fields that contain
>>>>> the virtiofsd state.
>>>>>
>>>>> The goal is for QEMU to support 3 migration modes, depending on the
>>>>> vhost-user-fs server:
>>>>> 1. No migration support.
>>>>> 2. Stateless migration.
>>>>> 3. Stateful migration.
>>>> Sure. These vmstate fields are very generic and mandatory for any
>>>> virtio device. If in future more state can be transfer in migration
>>>> stream the vmstate can be extended with additional fields. This can
>>>> be done with new subsections and/or bumping version_id.
>>> My concern is that the vmstate introduced in this patch may be
>>> unusable when stateful migration is added. So additional compatibility
>>> code will need to be introduced to make your stateless migration
>>> continue working with extended vmstate.
>>>
>>> By adding a marker field in this patch it should be possible to
>>> continue using the same vmstate for stateless migration without adding
>>> extra compatibility code in the future.
>> I understand, but this fields in vmstate just packs generic virtio
>> device state that is accessible by qemu. All additional data could be
>> added later by extra fields. I believe we couldn't pull off any type
>> of virtio device migration without transferring virtqueues so more
>> sophisticated types of migration would require adding more data and
>> not modification to this part of vmstate.
> What I'm saying is that your patch could define the vmstate such that
> it that contains a field to differentiate between stateless and
> stateful migration. That way QEMU versions that only support stateless
> migration (this patch) will be able to migrate to future QEMU versions
> that support both stateless and stateful without compatibility issues.
I double-checked migration documentation for subsections at
https://www.qemu.org/docs/master/devel/migration.html#subsections
and believe it perfectly describes our case: virtio device state
should always be transferred both in stateless or stateful migration.
With stateful one we would add new subsection with extra data that
will be transferred only if stateless capability is not set. We can
connect this subsection to device property and machine type if we
need to.
On the receiving side we always need basic virtio device state and
newer versions will be able to load extra data from subsection if it
is present, while older versions will be still able to accept the
migrations that were initiated from new versions with stateless flag
set and don't have extra subsection.

The only scenario that will fail is older qemu won't be able to load
new migration stream with additional subsection that it can't
understand - this is general limitation of migration compatibility.
So we can't completely protect older versions from future migration
stream format because we don't know what will be in that stream
but looks like we have all the tools to maintain compatibility
reasonably wide.
> I'm not sure if my suggestion to add a marker field to vuf_vmstate is
> the best way to do this, but have you thought of how to handle the
> future addition of stateful migration to the vmstate without breaking
> stateless vmstates? Maybe David Gilbert has a suggestion for how to do
> this cleanly.
>
> Stefan

I think we'd be better without a new marker because migration code
has standard generic way of solving such puzzles that I described
above. So adding new marker would go against existing practice.
But if you could show me where I missed something I'll be grateful
and will fix it to avoid potential problems.
I'd also be happy to know the opinion of Dr. David Alan Gilbert.

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Juan Quintela 3 years ago

Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
> On 19/01/2023 18:02, Stefan Hajnoczi wrote:
>> On Thu, 19 Jan 2023 at 10:29, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>>> On 19/01/2023 16:30, Stefan Hajnoczi wrote:
>>>> On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>>>>> On 18/01/2023 17:52, Stefan Hajnoczi wrote:
>>>>>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:

Hi

Sorry to come so late into the discussion.


>>>>>>> +static int vhost_user_fs_pre_save(void *opaque)
>>>>>>> +{
>>>>>>> +    MigrationState *s = migrate_get_current();
>>>>>>> +
>>>>>>> +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
>>>>>>> +        error_report("Migration of vhost-user-fs devices requires internal FUSE "
>>>>>>> +                     "state of backend to be preserved. If orchestrator can "
>>>>>>> +                     "guarantee this (e.g. dst connects to the same backend "
>>>>>>> +                     "instance or backend state is migrated) set 'vhost-user-fs' "
>>>>>>> +                     "migration capability to true to enable migration.");
>>>>>>> +        return -1;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>>     static const VMStateDescription vuf_vmstate = {
>>>>>>>         .name = "vhost-user-fs",
>>>>>>> -    .unmigratable = 1,
>>>>>>> +    .minimum_version_id = 0,
>>>>>>> +    .version_id = 0,
>>>>>>> +    .fields = (VMStateField[]) {
>>>>>>> +        VMSTATE_VIRTIO_DEVICE,
>>>>>>> +        VMSTATE_END_OF_LIST()
>>>>>>> +    },
>>>>>>> +   .pre_save = vhost_user_fs_pre_save,
>>>>>>>     };

I don't object to extend the vmstate this way.

But I object to the migration capability for several reasons:
- This feature has _nothing_ to do with migration, the problem, what it
  describes, etc is related to vhost_user_fs.
- The number of migration capabilities is limited
  And we add checks to see if they are valid, consistent, etc
  see qemu/migration/migration.c:migrate_caps_check()
- As Stefan says, we can have several vhost_user_fs devices, and each
  one should know if it can migrate or not.
- We have to options for the orchestator:
  * it knows that all the vhost_user_fs devices can be migration
    Then it can add a property to each vhost_user_fs device
  * it don't know it
    Then it is a good idea that we are not migrating this VM.

> I think we'd be better without a new marker because migration code
> has standard generic way of solving such puzzles that I described
> above. So adding new marker would go against existing practice.
> But if you could show me where I missed something I'll be grateful
> and will fix it to avoid potential problems.
> I'd also be happy to know the opinion of Dr. David Alan Gilbert.

If everybody agrees that any vhost_user_fs device is going to have a
virtio device, then I agree with you that the marker is not needed at
this point.

Once told that, I think that you are making your live harder in the
future when you add the other migratable devices.

I am assuming here that your "underlying device" is:

enum VhostUserFSType {
    VHOST_USER_NO_MIGRATABLE = 0,
    // The one we are doing here
    VHOST_USER_EXTERNAL_MIGRATABLE,
    // The one you describe for the future
    VHOST_USER_INTERNAL_MIGRATABLE,
};

struct VHostUserFS {
    /*< private >*/
    VirtIODevice parent;
    VHostUserFSConf conf;
    struct vhost_virtqueue *vhost_vqs;
    struct vhost_dev vhost_dev;
    VhostUserState vhost_user;
    VirtQueue **req_vqs;
    VirtQueue *hiprio_vq;
    int32_t bootindex;
    enum migration_type;
    /*< public >*/
};


static int vhost_user_fs_pre_save(void *opaque)
{
    VHostUserFS *s = opaque;

    if (s->migration_type == VHOST_USER_NO_MIGRATABLE)) {
        error_report("Migration of vhost-user-fs devices requires internal FUSE "
                     "state of backend to be preserved. If orchestrator can "
                     "guarantee this (e.g. dst connects to the same backend "
                     "instance or backend state is migrated) set 'vhost-user-fs' "
                     "migration capability to true to enable migration.");
        return -1;
    }
    if (s->migration_type == VHOST_USER_EXTERNAL_MIGRATABLE) {
        return 0;
    }
    if (s->migration_type == VHOST_USER_INTERNAL_MIGRATABLE) {
        error_report("still not implemented");
        return -1;
    }
    assert("we don't reach here");
}

Your initial vmstateDescription

static const VMStateDescription vuf_vmstate = {
    .name = "vhost-user-fs",
    .unmigratable = 1,
    .minimum_version_id = 0,
    .version_id = 0,
    .fields = (VMStateField[]) {
        VMSTATE_INT8(migration_type, struct VHostUserFS),
        VMSTATE_VIRTIO_DEVICE,
        VMSTATE_END_OF_LIST()
    },
    .pre_save = vhost_user_fs_pre_save,
};

And later you change it to something like:

static bool vhost_fs_user_internal_state_needed(void *opaque)
{
    VHostUserFS *s = opaque;

    return s->migration_type == VMOST_USER_INTERNAL_MIGRATABLE;
}

static const VMStateDescription vmstate_vhost_user_fs_internal_sub = {
    .name = "vhost-user-fs/internal",
    .version_id = 1,
    .minimum_version_id = 1,
    .needed = &vhost_fs_user_internal_state_needed,
    .fields = (VMStateField[]) {
        .... // Whatever
        VMSTATE_END_OF_LIST()
    }
};

static const VMStateDescription vuf_vmstate = {
    .name = "vhost-user-fs",
    .minimum_version_id = 0,
    .version_id = 0,
    .fields = (VMStateField[]) {
        VMSTATE_INT8(migration_type, struct VHostUserFS),
        VMSTATE_VIRTIO_DEVICE,
        VMSTATE_END_OF_LIST()
    },
    .pre_save = vhost_user_fs_pre_save,
    .subsections = (const VMStateDescription*[]) {
        &vmstate_vhost_user_fs_internal_sub,
        NULL
    }
};

And you are done.

I will propose to use a property to set migration_type, but I didn't
want to write the code right now.

I think that this proposal will make Stephan happy, and it is just
adding and extra uint8_t that is helpul to implement everything.

Later, Juan.

PD.  One of the few things that Pascal got right and C got completely
     wrong were pascal variant registers vs C union's.  If you have a
     union, if should be "required" that there is a field in the
     enclosing struct that specifies what element of the union we have.
     This is exactly that case.

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Anton Kuchin 3 years ago

On 01/02/2023 16:26, Juan Quintela wrote:
> Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>> On 19/01/2023 18:02, Stefan Hajnoczi wrote:
>>> On Thu, 19 Jan 2023 at 10:29, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>>>> On 19/01/2023 16:30, Stefan Hajnoczi wrote:
>>>>> On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>>>>>> On 18/01/2023 17:52, Stefan Hajnoczi wrote:
>>>>>>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
> Hi
>
> Sorry to come so late into the discussion.

You are just in time, I'm working on v2 of this patch right now :-)

>
>>>>>>>> +static int vhost_user_fs_pre_save(void *opaque)
>>>>>>>> +{
>>>>>>>> +    MigrationState *s = migrate_get_current();
>>>>>>>> +
>>>>>>>> +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
>>>>>>>> +        error_report("Migration of vhost-user-fs devices requires internal FUSE "
>>>>>>>> +                     "state of backend to be preserved. If orchestrator can "
>>>>>>>> +                     "guarantee this (e.g. dst connects to the same backend "
>>>>>>>> +                     "instance or backend state is migrated) set 'vhost-user-fs' "
>>>>>>>> +                     "migration capability to true to enable migration.");
>>>>>>>> +        return -1;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>      static const VMStateDescription vuf_vmstate = {
>>>>>>>>          .name = "vhost-user-fs",
>>>>>>>> -    .unmigratable = 1,
>>>>>>>> +    .minimum_version_id = 0,
>>>>>>>> +    .version_id = 0,
>>>>>>>> +    .fields = (VMStateField[]) {
>>>>>>>> +        VMSTATE_VIRTIO_DEVICE,
>>>>>>>> +        VMSTATE_END_OF_LIST()
>>>>>>>> +    },
>>>>>>>> +   .pre_save = vhost_user_fs_pre_save,
>>>>>>>>      };
> I don't object to extend the vmstate this way.
>
> But I object to the migration capability for several reasons:
> - This feature has _nothing_ to do with migration, the problem, what it
>    describes, etc is related to vhost_user_fs.
> - The number of migration capabilities is limited
>    And we add checks to see if they are valid, consistent, etc
>    see qemu/migration/migration.c:migrate_caps_check()
> - As Stefan says, we can have several vhost_user_fs devices, and each
>    one should know if it can migrate or not.
> - We have to options for the orchestator:
>    * it knows that all the vhost_user_fs devices can be migration
>      Then it can add a property to each vhost_user_fs device
>    * it don't know it
>      Then it is a good idea that we are not migrating this VM.
>
>> I think we'd be better without a new marker because migration code
>> has standard generic way of solving such puzzles that I described
>> above. So adding new marker would go against existing practice.
>> But if you could show me where I missed something I'll be grateful
>> and will fix it to avoid potential problems.
>> I'd also be happy to know the opinion of Dr. David Alan Gilbert.
> If everybody agrees that any vhost_user_fs device is going to have a
> virtio device, then I agree with you that the marker is not needed at
> this point.
>
> Once told that, I think that you are making your live harder in the
> future when you add the other migratable devices.
>
> I am assuming here that your "underlying device" is:
>
> enum VhostUserFSType {
>      VHOST_USER_NO_MIGRATABLE = 0,
>      // The one we are doing here
>      VHOST_USER_EXTERNAL_MIGRATABLE,
>      // The one you describe for the future
>      VHOST_USER_INTERNAL_MIGRATABLE,
> };
>
> struct VHostUserFS {
>      /*< private >*/
>      VirtIODevice parent;
>      VHostUserFSConf conf;
>      struct vhost_virtqueue *vhost_vqs;
>      struct vhost_dev vhost_dev;
>      VhostUserState vhost_user;
>      VirtQueue **req_vqs;
>      VirtQueue *hiprio_vq;
>      int32_t bootindex;
>      enum migration_type;
>      /*< public >*/
> };
>
>
> static int vhost_user_fs_pre_save(void *opaque)
> {
>      VHostUserFS *s = opaque;
>
>      if (s->migration_type == VHOST_USER_NO_MIGRATABLE)) {
>          error_report("Migration of vhost-user-fs devices requires internal FUSE "
>                       "state of backend to be preserved. If orchestrator can "
>                       "guarantee this (e.g. dst connects to the same backend "
>                       "instance or backend state is migrated) set 'vhost-user-fs' "
>                       "migration capability to true to enable migration.");
>          return -1;
>      }
>      if (s->migration_type == VHOST_USER_EXTERNAL_MIGRATABLE) {
>          return 0;
>      }
>      if (s->migration_type == VHOST_USER_INTERNAL_MIGRATABLE) {
>          error_report("still not implemented");
>          return -1;
>      }
>      assert("we don't reach here");
> }
>
> Your initial vmstateDescription
>
> static const VMStateDescription vuf_vmstate = {
>      .name = "vhost-user-fs",
>      .unmigratable = 1,
>      .minimum_version_id = 0,
>      .version_id = 0,
>      .fields = (VMStateField[]) {
>          VMSTATE_INT8(migration_type, struct VHostUserFS),
>          VMSTATE_VIRTIO_DEVICE,
>          VMSTATE_END_OF_LIST()
>      },
>      .pre_save = vhost_user_fs_pre_save,
> };
>
> And later you change it to something like:
>
> static bool vhost_fs_user_internal_state_needed(void *opaque)
> {
>      VHostUserFS *s = opaque;
>
>      return s->migration_type == VMOST_USER_INTERNAL_MIGRATABLE;
> }
>
> static const VMStateDescription vmstate_vhost_user_fs_internal_sub = {
>      .name = "vhost-user-fs/internal",
>      .version_id = 1,
>      .minimum_version_id = 1,
>      .needed = &vhost_fs_user_internal_state_needed,
>      .fields = (VMStateField[]) {
>          .... // Whatever
>          VMSTATE_END_OF_LIST()
>      }
> };
>
> static const VMStateDescription vuf_vmstate = {
>      .name = "vhost-user-fs",
>      .minimum_version_id = 0,
>      .version_id = 0,
>      .fields = (VMStateField[]) {
>          VMSTATE_INT8(migration_type, struct VHostUserFS),
>          VMSTATE_VIRTIO_DEVICE,
>          VMSTATE_END_OF_LIST()
>      },
>      .pre_save = vhost_user_fs_pre_save,
>      .subsections = (const VMStateDescription*[]) {
>          &vmstate_vhost_user_fs_internal_sub,
>          NULL
>      }
> };
>
> And you are done.
>
> I will propose to use a property to set migration_type, but I didn't
> want to write the code right now.
>
> I think that this proposal will make Stephan happy, and it is just
> adding and extra uint8_t that is helpul to implement everything.

That is exactly the approach I'm trying to implement it right now. Single
flag can be convenient for orchestrator but makes it too hard to account in
all cases for all devices on qemu side without breaking future 
compatibility.
So I'm rewriting it with properties.

BTW do you think each vhost-user device should have its own enum of 
migration
types or maybe we could make them common for all device types?

>
> Later, Juan.
>
> PD.  One of the few things that Pascal got right and C got completely
>       wrong were pascal variant registers vs C union's.  If you have a
>       union, if should be "required" that there is a field in the
>       enclosing struct that specifies what element of the union we have.
>       This is exactly that case.
>

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Juan Quintela 3 years ago

Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
> On 01/02/2023 16:26, Juan Quintela wrote:
>> Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>>> On 19/01/2023 18:02, Stefan Hajnoczi wrote:
>>>> On Thu, 19 Jan 2023 at 10:29, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>>>>> On 19/01/2023 16:30, Stefan Hajnoczi wrote:
>>>>>> On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>>>>>>> On 18/01/2023 17:52, Stefan Hajnoczi wrote:
>>>>>>>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>> Once told that, I think that you are making your live harder in the
>> future when you add the other migratable devices.
>>
>> I am assuming here that your "underlying device" is:
>>
>> enum VhostUserFSType {
>>      VHOST_USER_NO_MIGRATABLE = 0,
>>      // The one we are doing here
>>      VHOST_USER_EXTERNAL_MIGRATABLE,
>>      // The one you describe for the future
>>      VHOST_USER_INTERNAL_MIGRATABLE,
>> };
>>
>> struct VHostUserFS {
>>      /*< private >*/
>>      VirtIODevice parent;
>>      VHostUserFSConf conf;
>>      struct vhost_virtqueue *vhost_vqs;
>>      struct vhost_dev vhost_dev;
>>      VhostUserState vhost_user;
>>      VirtQueue **req_vqs;
>>      VirtQueue *hiprio_vq;
>>      int32_t bootindex;
>>      enum migration_type;
>>      /*< public >*/
>> };
>>
>>
>> static int vhost_user_fs_pre_save(void *opaque)
>> {
>>      VHostUserFS *s = opaque;
>>
>>      if (s->migration_type == VHOST_USER_NO_MIGRATABLE)) {
>>          error_report("Migration of vhost-user-fs devices requires internal FUSE "
>>                       "state of backend to be preserved. If orchestrator can "
>>                       "guarantee this (e.g. dst connects to the same backend "
>>                       "instance or backend state is migrated) set 'vhost-user-fs' "
>>                       "migration capability to true to enable migration.");
>>          return -1;
>>      }
>>      if (s->migration_type == VHOST_USER_EXTERNAL_MIGRATABLE) {
>>          return 0;
>>      }
>>      if (s->migration_type == VHOST_USER_INTERNAL_MIGRATABLE) {
>>          error_report("still not implemented");
>>          return -1;
>>      }
>>      assert("we don't reach here");
>> }
>>
>> Your initial vmstateDescription
>>
>> static const VMStateDescription vuf_vmstate = {
>>      .name = "vhost-user-fs",
>>      .unmigratable = 1,
>>      .minimum_version_id = 0,
>>      .version_id = 0,
>>      .fields = (VMStateField[]) {
>>          VMSTATE_INT8(migration_type, struct VHostUserFS),
>>          VMSTATE_VIRTIO_DEVICE,
>>          VMSTATE_END_OF_LIST()
>>      },
>>      .pre_save = vhost_user_fs_pre_save,
>> };
>>
>> And later you change it to something like:
>>
>> static bool vhost_fs_user_internal_state_needed(void *opaque)
>> {
>>      VHostUserFS *s = opaque;
>>
>>      return s->migration_type == VMOST_USER_INTERNAL_MIGRATABLE;
>> }
>>
>> static const VMStateDescription vmstate_vhost_user_fs_internal_sub = {
>>      .name = "vhost-user-fs/internal",
>>      .version_id = 1,
>>      .minimum_version_id = 1,
>>      .needed = &vhost_fs_user_internal_state_needed,
>>      .fields = (VMStateField[]) {
>>          .... // Whatever
>>          VMSTATE_END_OF_LIST()
>>      }
>> };
>>
>> static const VMStateDescription vuf_vmstate = {
>>      .name = "vhost-user-fs",
>>      .minimum_version_id = 0,
>>      .version_id = 0,
>>      .fields = (VMStateField[]) {
>>          VMSTATE_INT8(migration_type, struct VHostUserFS),
>>          VMSTATE_VIRTIO_DEVICE,
>>          VMSTATE_END_OF_LIST()
>>      },
>>      .pre_save = vhost_user_fs_pre_save,
>>      .subsections = (const VMStateDescription*[]) {
>>          &vmstate_vhost_user_fs_internal_sub,
>>          NULL
>>      }
>> };
>>
>> And you are done.
>>
>> I will propose to use a property to set migration_type, but I didn't
>> want to write the code right now.
>>
>> I think that this proposal will make Stephan happy, and it is just
>> adding and extra uint8_t that is helpul to implement everything.
>
> That is exactly the approach I'm trying to implement it right now. Single
> flag can be convenient for orchestrator but makes it too hard to account in
> all cases for all devices on qemu side without breaking future
> compatibility.
> So I'm rewriting it with properties.

Nice.  That would be my proposal.  Just a bit complicated for a proof of concetp.

> BTW do you think each vhost-user device should have its own enum of
> migration
> types or maybe we could make them common for all device types?

I will put it for vhost-user, because as far as I know nobody else is
asking for this functionality.

The most similar device that I can think of right now is vfio devices.
But they are implemeting callbacks to save hardware device state, and
they go device by device, i.e. there is nothing general there.

Later, Juan.

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Anton Kuchin 3 years ago

On 02/02/2023 11:59, Juan Quintela wrote:
> Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>> On 01/02/2023 16:26, Juan Quintela wrote:
>>> Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>>>> On 19/01/2023 18:02, Stefan Hajnoczi wrote:
>>>>> On Thu, 19 Jan 2023 at 10:29, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>>>>>> On 19/01/2023 16:30, Stefan Hajnoczi wrote:
>>>>>>> On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>>>>>>>> On 18/01/2023 17:52, Stefan Hajnoczi wrote:
>>>>>>>>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>>> Once told that, I think that you are making your live harder in the
>>> future when you add the other migratable devices.
>>>
>>> I am assuming here that your "underlying device" is:
>>>
>>> enum VhostUserFSType {
>>>       VHOST_USER_NO_MIGRATABLE = 0,
>>>       // The one we are doing here
>>>       VHOST_USER_EXTERNAL_MIGRATABLE,
>>>       // The one you describe for the future
>>>       VHOST_USER_INTERNAL_MIGRATABLE,
>>> };
>>>
>>> struct VHostUserFS {
>>>       /*< private >*/
>>>       VirtIODevice parent;
>>>       VHostUserFSConf conf;
>>>       struct vhost_virtqueue *vhost_vqs;
>>>       struct vhost_dev vhost_dev;
>>>       VhostUserState vhost_user;
>>>       VirtQueue **req_vqs;
>>>       VirtQueue *hiprio_vq;
>>>       int32_t bootindex;
>>>       enum migration_type;
>>>       /*< public >*/
>>> };
>>>
>>>
>>> static int vhost_user_fs_pre_save(void *opaque)
>>> {
>>>       VHostUserFS *s = opaque;
>>>
>>>       if (s->migration_type == VHOST_USER_NO_MIGRATABLE)) {
>>>           error_report("Migration of vhost-user-fs devices requires internal FUSE "
>>>                        "state of backend to be preserved. If orchestrator can "
>>>                        "guarantee this (e.g. dst connects to the same backend "
>>>                        "instance or backend state is migrated) set 'vhost-user-fs' "
>>>                        "migration capability to true to enable migration.");
>>>           return -1;
>>>       }
>>>       if (s->migration_type == VHOST_USER_EXTERNAL_MIGRATABLE) {
>>>           return 0;
>>>       }
>>>       if (s->migration_type == VHOST_USER_INTERNAL_MIGRATABLE) {
>>>           error_report("still not implemented");
>>>           return -1;
>>>       }
>>>       assert("we don't reach here");
>>> }
>>>
>>> Your initial vmstateDescription
>>>
>>> static const VMStateDescription vuf_vmstate = {
>>>       .name = "vhost-user-fs",
>>>       .unmigratable = 1,
>>>       .minimum_version_id = 0,
>>>       .version_id = 0,
>>>       .fields = (VMStateField[]) {
>>>           VMSTATE_INT8(migration_type, struct VHostUserFS),
>>>           VMSTATE_VIRTIO_DEVICE,
>>>           VMSTATE_END_OF_LIST()
>>>       },
>>>       .pre_save = vhost_user_fs_pre_save,
>>> };
>>>
>>> And later you change it to something like:
>>>
>>> static bool vhost_fs_user_internal_state_needed(void *opaque)
>>> {
>>>       VHostUserFS *s = opaque;
>>>
>>>       return s->migration_type == VMOST_USER_INTERNAL_MIGRATABLE;
>>> }
>>>
>>> static const VMStateDescription vmstate_vhost_user_fs_internal_sub = {
>>>       .name = "vhost-user-fs/internal",
>>>       .version_id = 1,
>>>       .minimum_version_id = 1,
>>>       .needed = &vhost_fs_user_internal_state_needed,
>>>       .fields = (VMStateField[]) {
>>>           .... // Whatever
>>>           VMSTATE_END_OF_LIST()
>>>       }
>>> };
>>>
>>> static const VMStateDescription vuf_vmstate = {
>>>       .name = "vhost-user-fs",
>>>       .minimum_version_id = 0,
>>>       .version_id = 0,
>>>       .fields = (VMStateField[]) {
>>>           VMSTATE_INT8(migration_type, struct VHostUserFS),
>>>           VMSTATE_VIRTIO_DEVICE,
>>>           VMSTATE_END_OF_LIST()
>>>       },
>>>       .pre_save = vhost_user_fs_pre_save,
>>>       .subsections = (const VMStateDescription*[]) {
>>>           &vmstate_vhost_user_fs_internal_sub,
>>>           NULL
>>>       }
>>> };
>>>
>>> And you are done.
>>>
>>> I will propose to use a property to set migration_type, but I didn't
>>> want to write the code right now.

I have a little problem with implementation here and need an advice:

Can we make this property adjustable at runtime after device was realized?
There is a statement in qemu wiki [1] that makes me think this is possible
but I couldn't find any code for it or example in other devices.
 > "Properties are, by default, locked while the device is realized. 
Exceptions
 > can be made by the devices themselves in order to implement a way for 
a user
 > to interact with a device while it is realized."

Or is your idea just to set this property once at construction and keep it
constant for device lifetime?

[1] https://wiki.qemu.org/Features/QOM

>>>
>>> I think that this proposal will make Stephan happy, and it is just
>>> adding and extra uint8_t that is helpul to implement everything.
>> That is exactly the approach I'm trying to implement it right now. Single
>> flag can be convenient for orchestrator but makes it too hard to account in
>> all cases for all devices on qemu side without breaking future
>> compatibility.
>> So I'm rewriting it with properties.
> Nice.  That would be my proposal.  Just a bit complicated for a proof of concetp.
>
>> BTW do you think each vhost-user device should have its own enum of
>> migration
>> types or maybe we could make them common for all device types?
> I will put it for vhost-user, because as far as I know nobody else is
> asking for this functionality.

I mean do we need it for all vhost-user devices or only for vhost-user-fs
that I'm implementing now?

>
> The most similar device that I can think of right now is vfio devices.
> But they are implemeting callbacks to save hardware device state, and
> they go device by device, i.e. there is nothing general there.
>
> Later, Juan.
>

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Juan Quintela 3 years ago

Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
> On 02/02/2023 11:59, Juan Quintela wrote:
>> Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>>> On 01/02/2023 16:26, Juan Quintela wrote:
>>>> Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>>>>> On 19/01/2023 18:02, Stefan Hajnoczi wrote:
>>>>>> On Thu, 19 Jan 2023 at 10:29, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>>>>>>> On 19/01/2023 16:30, Stefan Hajnoczi wrote:
>>>>>>>> On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>>>>>>>>> On 18/01/2023 17:52, Stefan Hajnoczi wrote:
>>>>>>>>>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>>>> Once told that, I think that you are making your live harder in the
>>>> future when you add the other migratable devices.
>>>>
>>>> static const VMStateDescription vuf_vmstate = {
>>>>       .name = "vhost-user-fs",
>>>>       .minimum_version_id = 0,
>>>>       .version_id = 0,
>>>>       .fields = (VMStateField[]) {
>>>>           VMSTATE_INT8(migration_type, struct VHostUserFS),
>>>>           VMSTATE_VIRTIO_DEVICE,
>>>>           VMSTATE_END_OF_LIST()
>>>>       },
>>>>       .pre_save = vhost_user_fs_pre_save,
>>>>       .subsections = (const VMStateDescription*[]) {
>>>>           &vmstate_vhost_user_fs_internal_sub,
>>>>           NULL
>>>>       }
>>>> };
>>>>
>>>> And you are done.
>>>>
>>>> I will propose to use a property to set migration_type, but I didn't
>>>> want to write the code right now.
>
> I have a little problem with implementation here and need an advice:
>
> Can we make this property adjustable at runtime after device was realized?
> There is a statement in qemu wiki [1] that makes me think this is possible
> but I couldn't find any code for it or example in other devices.
>> "Properties are, by default, locked while the device is
>   realized. Exceptions
>> can be made by the devices themselves in order to implement a way
>   for a user
>> to interact with a device while it is realized."
>
> Or is your idea just to set this property once at construction and keep it
> constant for device lifetime?
>
> [1] https://wiki.qemu.org/Features/QOM

I have no clue here.  Markus?  Stefan?

>>>>
>>>> I think that this proposal will make Stephan happy, and it is just
>>>> adding and extra uint8_t that is helpul to implement everything.
>>> That is exactly the approach I'm trying to implement it right now. Single
>>> flag can be convenient for orchestrator but makes it too hard to account in
>>> all cases for all devices on qemu side without breaking future
>>> compatibility.
>>> So I'm rewriting it with properties.
>> Nice.  That would be my proposal.  Just a bit complicated for a proof of concetp.
>>
>>> BTW do you think each vhost-user device should have its own enum of
>>> migration
>>> types or maybe we could make them common for all device types?
>> I will put it for vhost-user, because as far as I know nobody else is
>> asking for this functionality.
>
> I mean do we need it for all vhost-user devices or only for vhost-user-fs
> that I'm implementing now?

I will put it only for vhost-user-fs, except if there is a central place
that is used for all vhost-user and its easy to put there.

But I don't know enough about vhost-user to know if there is any common
struct to put this.

>> The most similar device that I can think of right now is vfio devices.
>> But they are implemeting callbacks to save hardware device state, and
>> they go device by device, i.e. there is nothing general there.
>>
>> Later, Juan.
>>

Later, Juan.

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Stefan Hajnoczi 2 years, 11 months ago

On Fri, Feb 10, 2023 at 05:08:16PM +0100, Juan Quintela wrote:
> Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
> > On 02/02/2023 11:59, Juan Quintela wrote:
> >> Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
> >>> On 01/02/2023 16:26, Juan Quintela wrote:
> >>>> Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
> >>>>> On 19/01/2023 18:02, Stefan Hajnoczi wrote:
> >>>>>> On Thu, 19 Jan 2023 at 10:29, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
> >>>>>>> On 19/01/2023 16:30, Stefan Hajnoczi wrote:
> >>>>>>>> On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
> >>>>>>>>> On 18/01/2023 17:52, Stefan Hajnoczi wrote:
> >>>>>>>>>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
> >>>> Once told that, I think that you are making your live harder in the
> >>>> future when you add the other migratable devices.
> >>>>
> >>>> static const VMStateDescription vuf_vmstate = {
> >>>>       .name = "vhost-user-fs",
> >>>>       .minimum_version_id = 0,
> >>>>       .version_id = 0,
> >>>>       .fields = (VMStateField[]) {
> >>>>           VMSTATE_INT8(migration_type, struct VHostUserFS),
> >>>>           VMSTATE_VIRTIO_DEVICE,
> >>>>           VMSTATE_END_OF_LIST()
> >>>>       },
> >>>>       .pre_save = vhost_user_fs_pre_save,
> >>>>       .subsections = (const VMStateDescription*[]) {
> >>>>           &vmstate_vhost_user_fs_internal_sub,
> >>>>           NULL
> >>>>       }
> >>>> };
> >>>>
> >>>> And you are done.
> >>>>
> >>>> I will propose to use a property to set migration_type, but I didn't
> >>>> want to write the code right now.
> >
> > I have a little problem with implementation here and need an advice:
> >
> > Can we make this property adjustable at runtime after device was realized?
> > There is a statement in qemu wiki [1] that makes me think this is possible
> > but I couldn't find any code for it or example in other devices.
> >> "Properties are, by default, locked while the device is
> >   realized. Exceptions
> >> can be made by the devices themselves in order to implement a way
> >   for a user
> >> to interact with a device while it is realized."
> >
> > Or is your idea just to set this property once at construction and keep it
> > constant for device lifetime?
> >
> > [1] https://wiki.qemu.org/Features/QOM
> 
> I have no clue here.  Markus?  Stefan?

Sorry for the late reply. Yes, QOM properties can be set after realize
(e.g. using the qom-set command).

The set() callback can return an error, so some properties are
implemented to refuse updates when ->realize is true.

Stefan

Re: [PATCH] vhost-user-fs: add capability to allow migration

Posted by Stefan Hajnoczi 3 years ago

On Thu, 19 Jan 2023 at 11:58, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
>
> On 19/01/2023 18:02, Stefan Hajnoczi wrote:
> > On Thu, 19 Jan 2023 at 10:29, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
> >> On 19/01/2023 16:30, Stefan Hajnoczi wrote:
> >>> On Thu, 19 Jan 2023 at 07:43, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
> >>>> On 18/01/2023 17:52, Stefan Hajnoczi wrote:
> >>>>> On Sun, 15 Jan 2023 at 12:21, Anton Kuchin <antonkuchin@yandex-team.ru> wrote:
> >>>>>> Now any vhost-user-fs device makes VM unmigratable, that also prevents
> >>>>>> qemu update without stopping the VM. In most cases that makes sense
> >>>>>> because qemu has no way to transfer FUSE session state.
> >>>>>>
> >>>>>> But we can give an option to orchestrator to override this if it can
> >>>>>> guarantee that state will be preserved (e.g. it uses migration to
> >>>>>> update qemu and dst will run on the same host as src and use the same
> >>>>>> socket endpoints).
> >>>>>>
> >>>>>> This patch keeps default behavior that prevents migration with such devices
> >>>>>> but adds migration capability 'vhost-user-fs' to explicitly allow migration.
> >>>>>>
> >>>>>> Signed-off-by: Anton Kuchin <antonkuchin@yandex-team.ru>
> >>>>>> ---
> >>>>>>     hw/virtio/vhost-user-fs.c | 25 ++++++++++++++++++++++++-
> >>>>>>     qapi/migration.json       |  7 ++++++-
> >>>>>>     2 files changed, 30 insertions(+), 2 deletions(-)
> >>>>>>
> >>>>>> diff --git a/hw/virtio/vhost-user-fs.c b/hw/virtio/vhost-user-fs.c
> >>>>>> index f5049735ac..13d920423e 100644
> >>>>>> --- a/hw/virtio/vhost-user-fs.c
> >>>>>> +++ b/hw/virtio/vhost-user-fs.c
> >>>>>> @@ -24,6 +24,7 @@
> >>>>>>     #include "hw/virtio/vhost-user-fs.h"
> >>>>>>     #include "monitor/monitor.h"
> >>>>>>     #include "sysemu/sysemu.h"
> >>>>>> +#include "migration/migration.h"
> >>>>>>
> >>>>>>     static const int user_feature_bits[] = {
> >>>>>>         VIRTIO_F_VERSION_1,
> >>>>>> @@ -298,9 +299,31 @@ static struct vhost_dev *vuf_get_vhost(VirtIODevice *vdev)
> >>>>>>         return &fs->vhost_dev;
> >>>>>>     }
> >>>>>>
> >>>>>> +static int vhost_user_fs_pre_save(void *opaque)
> >>>>>> +{
> >>>>>> +    MigrationState *s = migrate_get_current();
> >>>>>> +
> >>>>>> +    if (!s->enabled_capabilities[MIGRATION_CAPABILITY_VHOST_USER_FS]) {
> >>>>>> +        error_report("Migration of vhost-user-fs devices requires internal FUSE "
> >>>>>> +                     "state of backend to be preserved. If orchestrator can "
> >>>>>> +                     "guarantee this (e.g. dst connects to the same backend "
> >>>>>> +                     "instance or backend state is migrated) set 'vhost-user-fs' "
> >>>>>> +                     "migration capability to true to enable migration.");
> >>>>>> +        return -1;
> >>>>>> +    }
> >>>>>> +
> >>>>>> +    return 0;
> >>>>>> +}
> >>>>>> +
> >>>>>>     static const VMStateDescription vuf_vmstate = {
> >>>>>>         .name = "vhost-user-fs",
> >>>>>> -    .unmigratable = 1,
> >>>>>> +    .minimum_version_id = 0,
> >>>>>> +    .version_id = 0,
> >>>>>> +    .fields = (VMStateField[]) {
> >>>>>> +        VMSTATE_VIRTIO_DEVICE,
> >>>>>> +        VMSTATE_END_OF_LIST()
> >>>>>> +    },
> >>>>>> +   .pre_save = vhost_user_fs_pre_save,
> >>>>>>     };
> >>>>> Will it be possible to extend this vmstate when virtiofsd adds support
> >>>>> for stateful migration without breaking migration compatibility?
> >>>>>
> >>>>> If not, then I think a marker field should be added to the vmstate:
> >>>>> 0 - stateless/reconnect migration (the approach you're adding in this patch)
> >>>>> 1 - stateful migration (future virtiofsd feature)
> >>>>>
> >>>>> When the field is 0 there are no further vmstate fields and we trust
> >>>>> that the destination vhost-user-fs server already has the necessary
> >>>>> state.
> >>>>>
> >>>>> When the field is 1 there are additional vmstate fields that contain
> >>>>> the virtiofsd state.
> >>>>>
> >>>>> The goal is for QEMU to support 3 migration modes, depending on the
> >>>>> vhost-user-fs server:
> >>>>> 1. No migration support.
> >>>>> 2. Stateless migration.
> >>>>> 3. Stateful migration.
> >>>> Sure. These vmstate fields are very generic and mandatory for any
> >>>> virtio device. If in future more state can be transfer in migration
> >>>> stream the vmstate can be extended with additional fields. This can
> >>>> be done with new subsections and/or bumping version_id.
> >>> My concern is that the vmstate introduced in this patch may be
> >>> unusable when stateful migration is added. So additional compatibility
> >>> code will need to be introduced to make your stateless migration
> >>> continue working with extended vmstate.
> >>>
> >>> By adding a marker field in this patch it should be possible to
> >>> continue using the same vmstate for stateless migration without adding
> >>> extra compatibility code in the future.
> >> I understand, but this fields in vmstate just packs generic virtio
> >> device state that is accessible by qemu. All additional data could be
> >> added later by extra fields. I believe we couldn't pull off any type
> >> of virtio device migration without transferring virtqueues so more
> >> sophisticated types of migration would require adding more data and
> >> not modification to this part of vmstate.
> > What I'm saying is that your patch could define the vmstate such that
> > it that contains a field to differentiate between stateless and
> > stateful migration. That way QEMU versions that only support stateless
> > migration (this patch) will be able to migrate to future QEMU versions
> > that support both stateless and stateful without compatibility issues.
> I double-checked migration documentation for subsections at
> https://www.qemu.org/docs/master/devel/migration.html#subsections
> and believe it perfectly describes our case: virtio device state
> should always be transferred both in stateless or stateful migration.
> With stateful one we would add new subsection with extra data that
> will be transferred only if stateless capability is not set. We can
> connect this subsection to device property and machine type if we
> need to.
> On the receiving side we always need basic virtio device state and
> newer versions will be able to load extra data from subsection if it
> is present, while older versions will be still able to accept the
> migrations that were initiated from new versions with stateless flag
> set and don't have extra subsection.
>
> The only scenario that will fail is older qemu won't be able to load
> new migration stream with additional subsection that it can't
> understand - this is general limitation of migration compatibility.
> So we can't completely protect older versions from future migration
> stream format because we don't know what will be in that stream
> but looks like we have all the tools to maintain compatibility
> reasonably wide.
> > I'm not sure if my suggestion to add a marker field to vuf_vmstate is
> > the best way to do this, but have you thought of how to handle the
> > future addition of stateful migration to the vmstate without breaking
> > stateless vmstates? Maybe David Gilbert has a suggestion for how to do
> > this cleanly.
> >
> > Stefan
>
> I think we'd be better without a new marker because migration code
> has standard generic way of solving such puzzles that I described
> above. So adding new marker would go against existing practice.

That sounds good to me, thanks!

Stefan