[PATCH v3 06/17] hw/sd/sdcard: Do not store vendor data on block drive (CMD56)

Philippe Mathieu-Daudé posted 17 patches 2 months, 3 weeks ago
[PATCH v3 06/17] hw/sd/sdcard: Do not store vendor data on block drive (CMD56)
Posted by Philippe Mathieu-Daudé 2 months, 3 weeks ago
"General command" (GEN_CMD, CMD56) is described as:

  GEN_CMD is the same as the single block read or write
  commands (CMD24 or CMD17). The difference is that [...]
  the data block is not a memory payload data but has a
  vendor specific format and meaning.

Thus this block must not be stored overwriting data block
on underlying storage drive. Keep it in a dedicated
'vendor_data[]' array.

Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Cédric Le Goater <clg@redhat.com>
---
RFC: Is it safe to reuse VMSTATE_UNUSED_V() (which happens
to be the same size)?

Cc: Peter Xu <peterx@redhat.com>
Cc: Fabiano Rosas <farosas@suse.de>
---
 hw/sd/sd.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 464576751a..1f3eea6e84 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -142,6 +142,8 @@ struct SDState {
     uint64_t data_start;
     uint32_t data_offset;
     uint8_t data[512];
+    uint8_t vendor_data[512];
+
     qemu_irq readonly_cb;
     qemu_irq inserted_cb;
     QEMUTimer *ocr_power_timer;
@@ -656,6 +658,7 @@ static void sd_reset(DeviceState *dev)
     sd->wp_switch = sd->blk ? !blk_is_writable(sd->blk) : false;
     sd->wp_group_bits = sect;
     sd->wp_group_bmap = bitmap_new(sd->wp_group_bits);
+    memset(sd->vendor_data, 0xec, sizeof(sd->vendor_data));
     memset(sd->function_group, 0, sizeof(sd->function_group));
     sd->erase_start = INVALID_ADDRESS;
     sd->erase_end = INVALID_ADDRESS;
@@ -771,7 +774,7 @@ static const VMStateDescription sd_vmstate = {
         VMSTATE_UINT64(data_start, SDState),
         VMSTATE_UINT32(data_offset, SDState),
         VMSTATE_UINT8_ARRAY(data, SDState, 512),
-        VMSTATE_UNUSED_V(1, 512),
+        VMSTATE_UINT8_ARRAY(vendor_data, SDState, 512),
         VMSTATE_BOOL(enable, SDState),
         VMSTATE_END_OF_LIST()
     },
@@ -2029,9 +2032,8 @@ void sd_write_byte(SDState *sd, uint8_t value)
         break;
 
     case 56:  /* CMD56:  GEN_CMD */
-        sd->data[sd->data_offset ++] = value;
-        if (sd->data_offset >= sd->blk_len) {
-            APP_WRITE_BLOCK(sd->data_start, sd->data_offset);
+        sd->vendor_data[sd->data_offset ++] = value;
+        if (sd->data_offset >= sizeof(sd->vendor_data)) {
             sd->state = sd_transfer_state;
         }
         break;
@@ -2165,12 +2167,11 @@ uint8_t sd_read_byte(SDState *sd)
         break;
 
     case 56:  /* CMD56:  GEN_CMD */
-        if (sd->data_offset == 0)
-            APP_READ_BLOCK(sd->data_start, sd->blk_len);
-        ret = sd->data[sd->data_offset ++];
+        ret = sd->vendor_data[sd->data_offset ++];
 
-        if (sd->data_offset >= sd->blk_len)
+        if (sd->data_offset >= sizeof(sd->vendor_data)) {
             sd->state = sd_transfer_state;
+        }
         break;
 
     default:
-- 
2.41.0


Re: [PATCH v3 06/17] hw/sd/sdcard: Do not store vendor data on block drive (CMD56)
Posted by Fabiano Rosas 2 months, 1 week ago
Philippe Mathieu-Daudé <philmd@linaro.org> writes:

> "General command" (GEN_CMD, CMD56) is described as:
>
>   GEN_CMD is the same as the single block read or write
>   commands (CMD24 or CMD17). The difference is that [...]
>   the data block is not a memory payload data but has a
>   vendor specific format and meaning.
>
> Thus this block must not be stored overwriting data block
> on underlying storage drive. Keep it in a dedicated
> 'vendor_data[]' array.
>
> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> Tested-by: Cédric Le Goater <clg@redhat.com>
> ---
> RFC: Is it safe to reuse VMSTATE_UNUSED_V() (which happens
> to be the same size)?

Hi, sorry it took some time to get to this, I had just left for vacation
when you first posted.

I think it's ok:

{
  "field": "unused",
  "version_id": 1,
  "field_exists": false,
  "size": 512
},

vs.

{
  "field": "vendor_data",
  "version_id": 0,
  "field_exists": false,
  "num": 512,
  "size": 1
},

The unused field was introduced in 2016 so there's no chance of
migrating a QEMU that old to/from 9.1.
Re: [PATCH v3 06/17] hw/sd/sdcard: Do not store vendor data on block drive (CMD56)
Posted by Peter Xu 2 months, 1 week ago
On Tue, Jul 09, 2024 at 05:38:54PM -0300, Fabiano Rosas wrote:
> Philippe Mathieu-Daudé <philmd@linaro.org> writes:
> 
> > "General command" (GEN_CMD, CMD56) is described as:
> >
> >   GEN_CMD is the same as the single block read or write
> >   commands (CMD24 or CMD17). The difference is that [...]
> >   the data block is not a memory payload data but has a
> >   vendor specific format and meaning.
> >
> > Thus this block must not be stored overwriting data block
> > on underlying storage drive. Keep it in a dedicated
> > 'vendor_data[]' array.
> >
> > Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> > Tested-by: Cédric Le Goater <clg@redhat.com>
> > ---
> > RFC: Is it safe to reuse VMSTATE_UNUSED_V() (which happens
> > to be the same size)?
> 
> Hi, sorry it took some time to get to this, I had just left for vacation
> when you first posted.

And I totally overlooked there's the email.. until you replied.  Welcome
back.

> 
> I think it's ok:
> 
> {
>   "field": "unused",
>   "version_id": 1,
>   "field_exists": false,
>   "size": 512
> },
> 
> vs.
> 
> {
>   "field": "vendor_data",
>   "version_id": 0,
>   "field_exists": false,
>   "num": 512,
>   "size": 1
> },
> 
> The unused field was introduced in 2016 so there's no chance of
> migrating a QEMU that old to/from 9.1.

What happens if an old qemu 9.0 sends rubbish here to a new QEMU, while the
new QEMU would consider it meaningful data?

-- 
Peter Xu


Re: [PATCH v3 06/17] hw/sd/sdcard: Do not store vendor data on block drive (CMD56)
Posted by Fabiano Rosas 2 months, 1 week ago
Peter Xu <peterx@redhat.com> writes:

> On Tue, Jul 09, 2024 at 05:38:54PM -0300, Fabiano Rosas wrote:
>> Philippe Mathieu-Daudé <philmd@linaro.org> writes:
>> 
>> > "General command" (GEN_CMD, CMD56) is described as:
>> >
>> >   GEN_CMD is the same as the single block read or write
>> >   commands (CMD24 or CMD17). The difference is that [...]
>> >   the data block is not a memory payload data but has a
>> >   vendor specific format and meaning.
>> >
>> > Thus this block must not be stored overwriting data block
>> > on underlying storage drive. Keep it in a dedicated
>> > 'vendor_data[]' array.
>> >
>> > Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
>> > Tested-by: Cédric Le Goater <clg@redhat.com>
>> > ---
>> > RFC: Is it safe to reuse VMSTATE_UNUSED_V() (which happens
>> > to be the same size)?
>> 
>> Hi, sorry it took some time to get to this, I had just left for vacation
>> when you first posted.
>
> And I totally overlooked there's the email.. until you replied.  Welcome
> back.

Thanks!

>
>> 
>> I think it's ok:
>> 
>> {
>>   "field": "unused",
>>   "version_id": 1,
>>   "field_exists": false,
>>   "size": 512
>> },
>> 
>> vs.
>> 
>> {
>>   "field": "vendor_data",
>>   "version_id": 0,
>>   "field_exists": false,
>>   "num": 512,
>>   "size": 1
>> },
>> 
>> The unused field was introduced in 2016 so there's no chance of
>> migrating a QEMU that old to/from 9.1.
>
> What happens if an old qemu 9.0 sends rubbish here to a new QEMU, while the
> new QEMU would consider it meaningful data?

It will send zeros, no? The code will have to cope with that. The
alternative is to put the vendor_data in a subsection and the code will
also have to cope with the lack of data when the old QEMU doesn't send
it.
Re: [PATCH v3 06/17] hw/sd/sdcard: Do not store vendor data on block drive (CMD56)
Posted by Peter Xu 2 months, 1 week ago
On Wed, Jul 10, 2024 at 11:08:20AM -0300, Fabiano Rosas wrote:
> >> I think it's ok:
> >> 
> >> {
> >>   "field": "unused",
> >>   "version_id": 1,
> >>   "field_exists": false,
> >>   "size": 512
> >> },
> >> 
> >> vs.
> >> 
> >> {
> >>   "field": "vendor_data",
> >>   "version_id": 0,
> >>   "field_exists": false,
> >>   "num": 512,
> >>   "size": 1
> >> },
> >> 
> >> The unused field was introduced in 2016 so there's no chance of
> >> migrating a QEMU that old to/from 9.1.
> >
> > What happens if an old qemu 9.0 sends rubbish here to a new QEMU, while the
> > new QEMU would consider it meaningful data?
> 
> It will send zeros, no? The code will have to cope with that. The
> alternative is to put the vendor_data in a subsection and the code will
> also have to cope with the lack of data when the old QEMU doesn't send
> it.

Ah indeed, that "static const uint8_t buf[1024]" is there at least since
2017.  So yes, probably always sending zeros.

Nothing I can think of otherwise indeed, if we want to trust that nothing
will migrate before 2016.  It's just that we may want to know how that
"2016" is justified to be safe if we would like to allow that in the
future.

One thing _could_ be that "rule of thumb" is we plan to obsolete machines
with 6 years, so anything "UNUSED" older than 6 years can be over-written?

-- 
Peter Xu
Re: [PATCH v3 06/17] hw/sd/sdcard: Do not store vendor data on block drive (CMD56)
Posted by Fabiano Rosas 2 months, 1 week ago
Peter Xu <peterx@redhat.com> writes:

> On Wed, Jul 10, 2024 at 11:08:20AM -0300, Fabiano Rosas wrote:
>> >> I think it's ok:
>> >> 
>> >> {
>> >>   "field": "unused",
>> >>   "version_id": 1,
>> >>   "field_exists": false,
>> >>   "size": 512
>> >> },
>> >> 
>> >> vs.
>> >> 
>> >> {
>> >>   "field": "vendor_data",
>> >>   "version_id": 0,
>> >>   "field_exists": false,
>> >>   "num": 512,
>> >>   "size": 1
>> >> },
>> >> 
>> >> The unused field was introduced in 2016 so there's no chance of
>> >> migrating a QEMU that old to/from 9.1.
>> >
>> > What happens if an old qemu 9.0 sends rubbish here to a new QEMU, while the
>> > new QEMU would consider it meaningful data?
>> 
>> It will send zeros, no? The code will have to cope with that. The
>> alternative is to put the vendor_data in a subsection and the code will
>> also have to cope with the lack of data when the old QEMU doesn't send
>> it.
>
> Ah indeed, that "static const uint8_t buf[1024]" is there at least since
> 2017.  So yes, probably always sending zeros.

@Philippe, can vendor_data be 0 after migration? Otherwise 9.0 -> 9.1
migration might crash.

>
> Nothing I can think of otherwise indeed, if we want to trust that nothing
> will migrate before 2016.  It's just that we may want to know how that
> "2016" is justified to be safe if we would like to allow that in the
> future.

It's not about trust, we simply don't support migrations other than
n->n+1 and (maybe) n->n-1. So QEMU from 2016 is certainly not included.

>
> One thing _could_ be that "rule of thumb" is we plan to obsolete machines
> with 6 years, so anything "UNUSED" older than 6 years can be over-written?
Re: [PATCH v3 06/17] hw/sd/sdcard: Do not store vendor data on block drive (CMD56)
Posted by Peter Xu 2 months, 1 week ago
On Wed, Jul 10, 2024 at 01:21:51PM -0300, Fabiano Rosas wrote:
> It's not about trust, we simply don't support migrations other than
> n->n+1 and (maybe) n->n-1. So QEMU from 2016 is certainly not included.

Where does it come from?  I thought we suppport that..

The same question would be: are we requesting an OpenStack cluster to
always upgrade QEMU with +1 versions, otherwise migration will fail?

-- 
Peter Xu
Re: [PATCH v3 06/17] hw/sd/sdcard: Do not store vendor data on block drive (CMD56)
Posted by Fabiano Rosas 2 months, 1 week ago
Peter Xu <peterx@redhat.com> writes:

> On Wed, Jul 10, 2024 at 01:21:51PM -0300, Fabiano Rosas wrote:
>> It's not about trust, we simply don't support migrations other than
>> n->n+1 and (maybe) n->n-1. So QEMU from 2016 is certainly not included.
>
> Where does it come from?  I thought we suppport that..

I'm taking that from:

docs/devel/migration/main.rst:
  "In general QEMU tries to maintain forward migration compatibility
  (i.e. migrating from QEMU n->n+1) and there are users who benefit from
  backward compatibility as well."

But of course it doesn't say whether that comes with a transitive rule
allowing n->n+2 migrations.

>
> The same question would be: are we requesting an OpenStack cluster to
> always upgrade QEMU with +1 versions, otherwise migration will fail?

Will an OpenStack cluster be using upstream QEMU? If not, then that's a
question for the distro. In a very practical sense, we're not requesting
anything. We barely test n->n+1/n->n-1, even if we had a strong support
statement I wouldn't be confident saying migration from QEMU 2.7 -> QEMU
9.1 should succeed.
Re: [PATCH v3 06/17] hw/sd/sdcard: Do not store vendor data on block drive (CMD56)
Posted by Peter Xu 2 months, 1 week ago
On Wed, Jul 10, 2024 at 04:48:23PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Wed, Jul 10, 2024 at 01:21:51PM -0300, Fabiano Rosas wrote:
> >> It's not about trust, we simply don't support migrations other than
> >> n->n+1 and (maybe) n->n-1. So QEMU from 2016 is certainly not included.
> >
> > Where does it come from?  I thought we suppport that..
> 
> I'm taking that from:
> 
> docs/devel/migration/main.rst:
>   "In general QEMU tries to maintain forward migration compatibility
>   (i.e. migrating from QEMU n->n+1) and there are users who benefit from
>   backward compatibility as well."
> 
> But of course it doesn't say whether that comes with a transitive rule
> allowing n->n+2 migrations.

I'd say that "i.e." implies n->n+1 is not the only forward migration we
would support.

I _think_ we should support all forward migration as long as the machine
type matches.

> 
> >
> > The same question would be: are we requesting an OpenStack cluster to
> > always upgrade QEMU with +1 versions, otherwise migration will fail?
> 
> Will an OpenStack cluster be using upstream QEMU? If not, then that's a

It's an example to show what I meant! :) Nothing else. Definitely not
saying that everyone should use an upstream released QEMU (but in reality,
it's not a problem, I think, and I do feel like people use them, perhaps
more with the stable releases).

> question for the distro. In a very practical sense, we're not requesting
> anything. We barely test n->n+1/n->n-1, even if we had a strong support
> statement I wouldn't be confident saying migration from QEMU 2.7 -> QEMU
> 9.1 should succeed.

No matter what we test in CI, I don't think we should break that for >1
versions..  I hope 2.7->9.1 keeps working, otherwise I think it's legal to
file a bug by anyone.

For example, I randomly fetched a bug report:

https://gitlab.com/qemu-project/qemu/-/issues/1937

QEMU version:                6.2 and 7.2.5

And I believe that's the common case even for upstream.  If we don't do
that right for upstream, it can be impossible tasks for downstream and for
all of us to maintain.

-- 
Peter Xu
Re: [PATCH v3 06/17] hw/sd/sdcard: Do not store vendor data on block drive (CMD56)
Posted by Fabiano Rosas 2 months, 1 week ago
Peter Xu <peterx@redhat.com> writes:

> On Wed, Jul 10, 2024 at 04:48:23PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Wed, Jul 10, 2024 at 01:21:51PM -0300, Fabiano Rosas wrote:
>> >> It's not about trust, we simply don't support migrations other than
>> >> n->n+1 and (maybe) n->n-1. So QEMU from 2016 is certainly not included.
>> >
>> > Where does it come from?  I thought we suppport that..
>> 
>> I'm taking that from:
>> 
>> docs/devel/migration/main.rst:
>>   "In general QEMU tries to maintain forward migration compatibility
>>   (i.e. migrating from QEMU n->n+1) and there are users who benefit from
>>   backward compatibility as well."
>> 
>> But of course it doesn't say whether that comes with a transitive rule
>> allowing n->n+2 migrations.
>
> I'd say that "i.e." implies n->n+1 is not the only forward migration we
> would support.
>
> I _think_ we should support all forward migration as long as the machine
> type matches.
>
>> 
>> >
>> > The same question would be: are we requesting an OpenStack cluster to
>> > always upgrade QEMU with +1 versions, otherwise migration will fail?
>> 
>> Will an OpenStack cluster be using upstream QEMU? If not, then that's a
>
> It's an example to show what I meant! :) Nothing else. Definitely not
> saying that everyone should use an upstream released QEMU (but in reality,
> it's not a problem, I think, and I do feel like people use them, perhaps
> more with the stable releases).
>
>> question for the distro. In a very practical sense, we're not requesting
>> anything. We barely test n->n+1/n->n-1, even if we had a strong support
>> statement I wouldn't be confident saying migration from QEMU 2.7 -> QEMU
>> 9.1 should succeed.
>
> No matter what we test in CI, I don't think we should break that for >1
> versions..  I hope 2.7->9.1 keeps working, otherwise I think it's legal to
> file a bug by anyone.
>
> For example, I randomly fetched a bug report:
>
> https://gitlab.com/qemu-project/qemu/-/issues/1937
>
> QEMU version:                6.2 and 7.2.5
>
> And I believe that's the common case even for upstream.  If we don't do
> that right for upstream, it can be impossible tasks for downstream and for
> all of us to maintain.

But do we do that right currently? I have no idea. Have we ever done
it? And we're here discussing a hypothetical 2.7->9.1 ...

So we cannot reuse the UNUSED field because QEMU from 2016 might send
their data and QEMU from 2024 would interpret it wrong.

How do we proceed? Add a subsection. And make the code survive when
receiving 0.

@Peter is that it? What about backwards-compat? We'll need a property as
well it seems.
Re: [PATCH v3 06/17] hw/sd/sdcard: Do not store vendor data on block drive (CMD56)
Posted by Peter Xu 2 months, 1 week ago
On Wed, Jul 10, 2024 at 06:38:26PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Wed, Jul 10, 2024 at 04:48:23PM -0300, Fabiano Rosas wrote:
> >> Peter Xu <peterx@redhat.com> writes:
> >> 
> >> > On Wed, Jul 10, 2024 at 01:21:51PM -0300, Fabiano Rosas wrote:
> >> >> It's not about trust, we simply don't support migrations other than
> >> >> n->n+1 and (maybe) n->n-1. So QEMU from 2016 is certainly not included.
> >> >
> >> > Where does it come from?  I thought we suppport that..
> >> 
> >> I'm taking that from:
> >> 
> >> docs/devel/migration/main.rst:
> >>   "In general QEMU tries to maintain forward migration compatibility
> >>   (i.e. migrating from QEMU n->n+1) and there are users who benefit from
> >>   backward compatibility as well."
> >> 
> >> But of course it doesn't say whether that comes with a transitive rule
> >> allowing n->n+2 migrations.
> >
> > I'd say that "i.e." implies n->n+1 is not the only forward migration we
> > would support.
> >
> > I _think_ we should support all forward migration as long as the machine
> > type matches.
> >
> >> 
> >> >
> >> > The same question would be: are we requesting an OpenStack cluster to
> >> > always upgrade QEMU with +1 versions, otherwise migration will fail?
> >> 
> >> Will an OpenStack cluster be using upstream QEMU? If not, then that's a
> >
> > It's an example to show what I meant! :) Nothing else. Definitely not
> > saying that everyone should use an upstream released QEMU (but in reality,
> > it's not a problem, I think, and I do feel like people use them, perhaps
> > more with the stable releases).
> >
> >> question for the distro. In a very practical sense, we're not requesting
> >> anything. We barely test n->n+1/n->n-1, even if we had a strong support
> >> statement I wouldn't be confident saying migration from QEMU 2.7 -> QEMU
> >> 9.1 should succeed.
> >
> > No matter what we test in CI, I don't think we should break that for >1
> > versions..  I hope 2.7->9.1 keeps working, otherwise I think it's legal to
> > file a bug by anyone.
> >
> > For example, I randomly fetched a bug report:
> >
> > https://gitlab.com/qemu-project/qemu/-/issues/1937
> >
> > QEMU version:                6.2 and 7.2.5
> >
> > And I believe that's the common case even for upstream.  If we don't do
> > that right for upstream, it can be impossible tasks for downstream and for
> > all of us to maintain.
> 
> But do we do that right currently? I have no idea. Have we ever done
> it? And we're here discussing a hypothetical 2.7->9.1 ...
> 
> So we cannot reuse the UNUSED field because QEMU from 2016 might send
> their data and QEMU from 2024 would interpret it wrong.
> 
> How do we proceed? Add a subsection. And make the code survive when
> receiving 0.
> 
> @Peter is that it? What about backwards-compat? We'll need a property as
> well it seems.

Compat property is definitely one way to go, but I think it's you who more
or less persuaded me that reusing it seems possible! At least I can't yet
think of anything bad if it's ancient unused buffers.

And that's why I was asking about a sane way to describe the "magic
year".. And I was very serious when I said "6 years" to follow the
deprecation of machine types, because it'll be something we can follow to
say when an unused buffer can be obsolete and it could make some sense: if
we will start to deprecate machine types, then it may not make sense to
keep any UNUSED compatible forever, after all.

And we need that ruler to be as accurate as "always 6 years to follow
machine type deprecation procedure", in case someone else tomorrow asks us
something that was only UNUSED since 2018.  We need a rule of thumb if we
want to reuse it, and if all of you agree we can start with this one,
perhaps with a comment above the field (before we think all through and
make it a rule on deprecating UNUSED)?

-- 
Peter Xu
Re: [PATCH v3 06/17] hw/sd/sdcard: Do not store vendor data on block drive (CMD56)
Posted by Fabiano Rosas 2 months, 1 week ago
Peter Xu <peterx@redhat.com> writes:

> On Wed, Jul 10, 2024 at 06:38:26PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Wed, Jul 10, 2024 at 04:48:23PM -0300, Fabiano Rosas wrote:
>> >> Peter Xu <peterx@redhat.com> writes:
>> >> 
>> >> > On Wed, Jul 10, 2024 at 01:21:51PM -0300, Fabiano Rosas wrote:
>> >> >> It's not about trust, we simply don't support migrations other than
>> >> >> n->n+1 and (maybe) n->n-1. So QEMU from 2016 is certainly not included.
>> >> >
>> >> > Where does it come from?  I thought we suppport that..
>> >> 
>> >> I'm taking that from:
>> >> 
>> >> docs/devel/migration/main.rst:
>> >>   "In general QEMU tries to maintain forward migration compatibility
>> >>   (i.e. migrating from QEMU n->n+1) and there are users who benefit from
>> >>   backward compatibility as well."
>> >> 
>> >> But of course it doesn't say whether that comes with a transitive rule
>> >> allowing n->n+2 migrations.
>> >
>> > I'd say that "i.e." implies n->n+1 is not the only forward migration we
>> > would support.
>> >
>> > I _think_ we should support all forward migration as long as the machine
>> > type matches.
>> >
>> >> 
>> >> >
>> >> > The same question would be: are we requesting an OpenStack cluster to
>> >> > always upgrade QEMU with +1 versions, otherwise migration will fail?
>> >> 
>> >> Will an OpenStack cluster be using upstream QEMU? If not, then that's a
>> >
>> > It's an example to show what I meant! :) Nothing else. Definitely not
>> > saying that everyone should use an upstream released QEMU (but in reality,
>> > it's not a problem, I think, and I do feel like people use them, perhaps
>> > more with the stable releases).
>> >
>> >> question for the distro. In a very practical sense, we're not requesting
>> >> anything. We barely test n->n+1/n->n-1, even if we had a strong support
>> >> statement I wouldn't be confident saying migration from QEMU 2.7 -> QEMU
>> >> 9.1 should succeed.
>> >
>> > No matter what we test in CI, I don't think we should break that for >1
>> > versions..  I hope 2.7->9.1 keeps working, otherwise I think it's legal to
>> > file a bug by anyone.
>> >
>> > For example, I randomly fetched a bug report:
>> >
>> > https://gitlab.com/qemu-project/qemu/-/issues/1937
>> >
>> > QEMU version:                6.2 and 7.2.5
>> >
>> > And I believe that's the common case even for upstream.  If we don't do
>> > that right for upstream, it can be impossible tasks for downstream and for
>> > all of us to maintain.
>> 
>> But do we do that right currently? I have no idea. Have we ever done
>> it? And we're here discussing a hypothetical 2.7->9.1 ...
>> 
>> So we cannot reuse the UNUSED field because QEMU from 2016 might send
>> their data and QEMU from 2024 would interpret it wrong.
>> 
>> How do we proceed? Add a subsection. And make the code survive when
>> receiving 0.
>> 
>> @Peter is that it? What about backwards-compat? We'll need a property as
>> well it seems.
>
> Compat property is definitely one way to go, but I think it's you who more
> or less persuaded me that reusing it seems possible! At least I can't yet
> think of anything bad if it's ancient unused buffers.

Since we're allowing any old QEMU version to migrate to the most recent
one, we need to think of the data that was there before the introduction
of the UNUSED field. If that QEMU version is used, then it's not going
to be zeroes on the stream, but whatever data was there before. The new
QEMU will be expecting the vendor_data introduced in this patch.

> And that's why I was asking about a sane way to describe the "magic
> year".. And I was very serious when I said "6 years" to follow the
> deprecation of machine types, because it'll be something we can follow
> to say when an unused buffer can be obsolete and it could make some
> sense: if we will start to deprecate machine types, then it may not
> make sense to keep any UNUSED compatible forever, after all.
>

Is there an easy way to look at a field and tell in which machine type's
timeframe it was introduced? If the machine type of that era has been
removed, then the field is free to go as well. I'd prefer if we had a
hard link instead of just counting years. Maybe we should to that
mapping at the machine deprecation time? As in, "look at the unused
fields introduced in that timeframe and mark them free".

> And we need that ruler to be as accurate as "always 6 years to follow
> machine type deprecation procedure", in case someone else tomorrow asks us
> something that was only UNUSED since 2018.  We need a rule of thumb if we
> want to reuse it, and if all of you agree we can start with this one,
> perhaps with a comment above the field (before we think all through and
> make it a rule on deprecating UNUSED)?
Re: [PATCH v3 06/17] hw/sd/sdcard: Do not store vendor data on block drive (CMD56)
Posted by Peter Xu 2 months, 1 week ago
On Thu, Jul 11, 2024 at 10:34:12AM -0300, Fabiano Rosas wrote:
> Is there an easy way to look at a field and tell in which machine type's
> timeframe it was introduced?

I am not aware of any.

> If the machine type of that era has been removed, then the field is free
> to go as well. I'd prefer if we had a hard link instead of just counting
> years. Maybe we should to that mapping at the machine deprecation time?
> As in, "look at the unused fields introduced in that timeframe and mark
> them free".

We can do that, but depending on how easy it would be. That can be an
overkill to me if it's non-trivial.  When it becomes complicated, I'd
rather make machine compat property easier to use so we always stick with
that.  Currently it's not as easy to use.

Maybe we shouldn't make it a common rule to let people reuse the UNUSED
fields, even if in this case it's probably fine?

E.g. I don't think it's a huge deal to keep all UNUSED fields forever -
sending 512B zeros for only one specific device isn't an issue even if kept
forever.

If "over 6 years" would be okay and simple enough, then maybe we can stick
with that (and only if people would like to reuse a field and ask; that's
after all not required..).  If more than that I doubt whether we should
spend time working on covering all the fields.

-- 
Peter Xu
Re: [PATCH v3 06/17] hw/sd/sdcard: Do not store vendor data on block drive (CMD56)
Posted by Fabiano Rosas 2 months, 1 week ago
Peter Xu <peterx@redhat.com> writes:

> On Thu, Jul 11, 2024 at 10:34:12AM -0300, Fabiano Rosas wrote:
>> Is there an easy way to look at a field and tell in which machine type's
>> timeframe it was introduced?
>
> I am not aware of any.
>
>> If the machine type of that era has been removed, then the field is free
>> to go as well. I'd prefer if we had a hard link instead of just counting
>> years. Maybe we should to that mapping at the machine deprecation time?
>> As in, "look at the unused fields introduced in that timeframe and mark
>> them free".
>
> We can do that, but depending on how easy it would be. That can be an
> overkill to me if it's non-trivial.  When it becomes complicated, I'd
> rather make machine compat property easier to use so we always stick with
> that.  Currently it's not as easy to use.
>
> Maybe we shouldn't make it a common rule to let people reuse the UNUSED
> fields, even if in this case it's probably fine?
>
> E.g. I don't think it's a huge deal to keep all UNUSED fields forever -
> sending 512B zeros for only one specific device isn't an issue even if kept
> forever.
>
> If "over 6 years" would be okay and simple enough, then maybe we can stick
> with that (and only if people would like to reuse a field and ask; that's
> after all not required..).  If more than that I doubt whether we should
> spend time working on covering all the fields.

I'm fine with a simple rule.

But of course, that means we cannot claim to support all kinds of
forward migrations anymore. Only those in the 6 year period.
Re: [PATCH v3 06/17] hw/sd/sdcard: Do not store vendor data on block drive (CMD56)
Posted by Peter Xu 2 months, 1 week ago
On Thu, Jul 11, 2024 at 11:44:08AM -0300, Fabiano Rosas wrote:
> But of course, that means we cannot claim to support all kinds of
> forward migrations anymore. Only those in the 6 year period.

That "6 years" comes from machine type deprecation period, and migration
compatibility is mostly only attached to machine types, and we only ever
allowed migration with the same machine type.

It means, >6 years migration will never work anyway as soon as we start to
deprecate machine types (irrelevant of any reuse of UNUSED), because the >6
years machine types will simply be gone.. See configuration_post_load(),
where it'll throw an error upfront when machine type mismatched.

-- 
Peter Xu
Re: [PATCH v3 06/17] hw/sd/sdcard: Do not store vendor data on block drive (CMD56)
Posted by Fabiano Rosas 2 months, 1 week ago
Peter Xu <peterx@redhat.com> writes:

> On Thu, Jul 11, 2024 at 11:44:08AM -0300, Fabiano Rosas wrote:
>> But of course, that means we cannot claim to support all kinds of
>> forward migrations anymore. Only those in the 6 year period.
>
> That "6 years" comes from machine type deprecation period, and migration
> compatibility is mostly only attached to machine types, and we only ever
> allowed migration with the same machine type.
>
> It means, >6 years migration will never work anyway as soon as we start to
> deprecate machine types (irrelevant of any reuse of UNUSED), because the >6
> years machine types will simply be gone.. See configuration_post_load(),
> where it'll throw an error upfront when machine type mismatched.

Yes, duh! What am I talking about...