[PATCH v4] Use io_uring_register_ring_fd() to skip fd operations

Sam Li posted 1 patch 1 year, 12 months ago
There is a newer version of this series
block/io_uring.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
[PATCH v4] Use io_uring_register_ring_fd() to skip fd operations
Posted by Sam Li 1 year, 12 months ago
Linux recently added a new io_uring(7) optimization API that QEMU
doesn't take advantage of yet. The liburing library that QEMU uses
has added a corresponding new API calling io_uring_register_ring_fd().
When this API is called after creating the ring, the io_uring_submit()
library function passes a flag to the io_uring_enter(2) syscall
allowing it to skip the ring file descriptor fdget()/fdput()
operations. This saves some CPU cycles.

Signed-off-by: Sam Li <faithilikerun@gmail.com>
---
 block/io_uring.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index 782afdb433..5247fb79e2 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -435,8 +435,16 @@ LuringState *luring_init(Error **errp)
     }
 
     ioq_init(&s->io_q);
-    return s;
+    if (io_uring_register_ring_fd(&s->ring) < 0) {
+        /*
+         * Only warn about this error: we will fallback to the non-optimized
+         * io_uring operations.
+         */
+        error_reportf_err(*errp,
+                         "failed to register linux io_uring ring file descriptor");
+    }
 
+    return s;
 }
 
 void luring_cleanup(LuringState *s)
-- 
Use error_reportf_err to avoid memory leak due to not freeing error
object.
--
2.35.1
Re: [PATCH v4] Use io_uring_register_ring_fd() to skip fd operations
Posted by Stefan Hajnoczi 1 year, 12 months ago
On Fri, Apr 22, 2022 at 12:36:49AM +0800, Sam Li wrote:
> Linux recently added a new io_uring(7) optimization API that QEMU
> doesn't take advantage of yet. The liburing library that QEMU uses
> has added a corresponding new API calling io_uring_register_ring_fd().
> When this API is called after creating the ring, the io_uring_submit()
> library function passes a flag to the io_uring_enter(2) syscall
> allowing it to skip the ring file descriptor fdget()/fdput()
> operations. This saves some CPU cycles.
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>  block/io_uring.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/block/io_uring.c b/block/io_uring.c
> index 782afdb433..5247fb79e2 100644
> --- a/block/io_uring.c
> +++ b/block/io_uring.c
> @@ -435,8 +435,16 @@ LuringState *luring_init(Error **errp)
>      }
>  
>      ioq_init(&s->io_q);
> -    return s;
> +    if (io_uring_register_ring_fd(&s->ring) < 0) {

What happens when QEMU is built against an older version of liburing
that lacks the io_uring_register_ring_fd() API?

I guess there will be a compiler error because the function prototype is
missing in <liburing.h>.

This can be addressed by checking for the presence of the function in
meson.build:

+config_host_data.set('CONFIG_LIBURING_REGISTER_RING_FD', cc.has_function('io_uring_register_ring_fd', prefix: '#include <liburing.h>'))

Then block/io_uring.c can call the function only when available:

+#ifdef CONFIG_LIBURING_REGISTER_RING_FD
+    io_uring_register_ring_fd(&s->ring);
+#endif

(I haven't tested this code but it should be close.)

Stefan
Re: [PATCH v4] Use io_uring_register_ring_fd() to skip fd operations
Posted by olc 1 year, 12 months ago
Hi Stefan,
I've tested the code and it behaves as you expected. Should I add this to a
new patch version or leave it as is?

Sam

Stefan Hajnoczi <stefanha@gmail.com> 于2022年4月22日周五 23:10写道:

> On Fri, Apr 22, 2022 at 12:36:49AM +0800, Sam Li wrote:
> > Linux recently added a new io_uring(7) optimization API that QEMU
> > doesn't take advantage of yet. The liburing library that QEMU uses
> > has added a corresponding new API calling io_uring_register_ring_fd().
> > When this API is called after creating the ring, the io_uring_submit()
> > library function passes a flag to the io_uring_enter(2) syscall
> > allowing it to skip the ring file descriptor fdget()/fdput()
> > operations. This saves some CPU cycles.
> >
> > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > ---
> >  block/io_uring.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/io_uring.c b/block/io_uring.c
> > index 782afdb433..5247fb79e2 100644
> > --- a/block/io_uring.c
> > +++ b/block/io_uring.c
> > @@ -435,8 +435,16 @@ LuringState *luring_init(Error **errp)
> >      }
> >
> >      ioq_init(&s->io_q);
> > -    return s;
> > +    if (io_uring_register_ring_fd(&s->ring) < 0) {
>
> What happens when QEMU is built against an older version of liburing
> that lacks the io_uring_register_ring_fd() API?
>
> I guess there will be a compiler error because the function prototype is
> missing in <liburing.h>.
>
> This can be addressed by checking for the presence of the function in
> meson.build:
>
> +config_host_data.set('CONFIG_LIBURING_REGISTER_RING_FD',
> cc.has_function('io_uring_register_ring_fd', prefix: '#include
> <liburing.h>'))
>
> Then block/io_uring.c can call the function only when available:
>
> +#ifdef CONFIG_LIBURING_REGISTER_RING_FD
> +    io_uring_register_ring_fd(&s->ring);
> +#endif
>
> (I haven't tested this code but it should be close.)
>
> Stefan
>
Re: [PATCH v4] Use io_uring_register_ring_fd() to skip fd operations
Posted by Stefan Hajnoczi 1 year, 10 months ago
On Fri, 22 Apr 2022 at 16:40, olc <faithilikerun@gmail.com> wrote:
>
> Hi Stefan,
> I've tested the code and it behaves as you expected. Should I add this to a new patch version or leave it as is?

Hi Sam,
Sorry I missed this email. Please send a new version of the patch with
CONFIG_LIBURING_REGISTER_RING_FD.

Thanks,
Stefan
Re: [PATCH v4] Use io_uring_register_ring_fd() to skip fd operations
Posted by Fam Zheng 1 year, 12 months ago
On 2022-04-22 00:36, Sam Li wrote:
> Linux recently added a new io_uring(7) optimization API that QEMU
> doesn't take advantage of yet. The liburing library that QEMU uses
> has added a corresponding new API calling io_uring_register_ring_fd().
> When this API is called after creating the ring, the io_uring_submit()
> library function passes a flag to the io_uring_enter(2) syscall
> allowing it to skip the ring file descriptor fdget()/fdput()
> operations. This saves some CPU cycles.
> 
> Signed-off-by: Sam Li <faithilikerun@gmail.com>
> ---
>  block/io_uring.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/block/io_uring.c b/block/io_uring.c
> index 782afdb433..5247fb79e2 100644
> --- a/block/io_uring.c
> +++ b/block/io_uring.c
> @@ -435,8 +435,16 @@ LuringState *luring_init(Error **errp)
>      }
>  
>      ioq_init(&s->io_q);
> -    return s;
> +    if (io_uring_register_ring_fd(&s->ring) < 0) {
> +        /*
> +         * Only warn about this error: we will fallback to the non-optimized
> +         * io_uring operations.
> +         */
> +        error_reportf_err(*errp,
> +                         "failed to register linux io_uring ring file descriptor");

IIUC errp can be NULL, so let's not dereference it without checking. So, just
use error_report?

Fam

> +    }
>  
> +    return s;
>  }
>  
>  void luring_cleanup(LuringState *s)
> -- 
> Use error_reportf_err to avoid memory leak due to not freeing error
> object.
> --
> 2.35.1
> 
>
Re: [PATCH v4] Use io_uring_register_ring_fd() to skip fd operations
Posted by Daniel P. Berrangé 1 year, 12 months ago
On Fri, Apr 22, 2022 at 09:34:28AM +0100, Fam Zheng wrote:
> On 2022-04-22 00:36, Sam Li wrote:
> > Linux recently added a new io_uring(7) optimization API that QEMU
> > doesn't take advantage of yet. The liburing library that QEMU uses
> > has added a corresponding new API calling io_uring_register_ring_fd().
> > When this API is called after creating the ring, the io_uring_submit()
> > library function passes a flag to the io_uring_enter(2) syscall
> > allowing it to skip the ring file descriptor fdget()/fdput()
> > operations. This saves some CPU cycles.
> > 
> > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > ---
> >  block/io_uring.c | 10 +++++++++-
> >  1 file changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/io_uring.c b/block/io_uring.c
> > index 782afdb433..5247fb79e2 100644
> > --- a/block/io_uring.c
> > +++ b/block/io_uring.c
> > @@ -435,8 +435,16 @@ LuringState *luring_init(Error **errp)
> >      }
> >  
> >      ioq_init(&s->io_q);
> > -    return s;
> > +    if (io_uring_register_ring_fd(&s->ring) < 0) {
> > +        /*
> > +         * Only warn about this error: we will fallback to the non-optimized
> > +         * io_uring operations.
> > +         */
> > +        error_reportf_err(*errp,
> > +                         "failed to register linux io_uring ring file descriptor");
> 
> IIUC errp can be NULL, so let's not dereference it without checking. So, just
> use error_report?

Plenty of people will be running kernels that lack the new feature,
so this "failure" will be an expected scenario. We shouldn't be
spamming the logs with any error or warning message. Assuming  QEMU
remains fully functional, merely not as optimized, we should be
totally silent.

At most stick in a 'trace' point so we can record whether the
optimization is present.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH v4] Use io_uring_register_ring_fd() to skip fd operations
Posted by Fam Zheng 1 year, 12 months ago
On 2022-04-22 09:52, Daniel P. Berrangé wrote:
> On Fri, Apr 22, 2022 at 09:34:28AM +0100, Fam Zheng wrote:
> > On 2022-04-22 00:36, Sam Li wrote:
> > > Linux recently added a new io_uring(7) optimization API that QEMU
> > > doesn't take advantage of yet. The liburing library that QEMU uses
> > > has added a corresponding new API calling io_uring_register_ring_fd().
> > > When this API is called after creating the ring, the io_uring_submit()
> > > library function passes a flag to the io_uring_enter(2) syscall
> > > allowing it to skip the ring file descriptor fdget()/fdput()
> > > operations. This saves some CPU cycles.
> > > 
> > > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > > ---
> > >  block/io_uring.c | 10 +++++++++-
> > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/block/io_uring.c b/block/io_uring.c
> > > index 782afdb433..5247fb79e2 100644
> > > --- a/block/io_uring.c
> > > +++ b/block/io_uring.c
> > > @@ -435,8 +435,16 @@ LuringState *luring_init(Error **errp)
> > >      }
> > >  
> > >      ioq_init(&s->io_q);
> > > -    return s;
> > > +    if (io_uring_register_ring_fd(&s->ring) < 0) {
> > > +        /*
> > > +         * Only warn about this error: we will fallback to the non-optimized
> > > +         * io_uring operations.
> > > +         */
> > > +        error_reportf_err(*errp,
> > > +                         "failed to register linux io_uring ring file descriptor");
> > 
> > IIUC errp can be NULL, so let's not dereference it without checking. So, just
> > use error_report?
> 
> Plenty of people will be running kernels that lack the new feature,
> so this "failure" will be an expected scenario. We shouldn't be
> spamming the logs with any error or warning message. Assuming  QEMU
> remains fully functional, merely not as optimized, we should be
> totally silent.

Functionally, that's a very valid point. But performance wise, is it good to
have some visibility of this? Since people use io_uring instead of other
options almost certainly for performance, and here the issue does matter quite
a bit.

Fam

> 
> At most stick in a 'trace' point so we can record whether the
> optimization is present.
> 
> With regards,
> Daniel
> -- 
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
> 
> 
Re: [PATCH v4] Use io_uring_register_ring_fd() to skip fd operations
Posted by Daniel P. Berrangé 1 year, 12 months ago
On Fri, Apr 22, 2022 at 11:00:47AM +0100, Fam Zheng wrote:
> On 2022-04-22 09:52, Daniel P. Berrangé wrote:
> > On Fri, Apr 22, 2022 at 09:34:28AM +0100, Fam Zheng wrote:
> > > On 2022-04-22 00:36, Sam Li wrote:
> > > > Linux recently added a new io_uring(7) optimization API that QEMU
> > > > doesn't take advantage of yet. The liburing library that QEMU uses
> > > > has added a corresponding new API calling io_uring_register_ring_fd().
> > > > When this API is called after creating the ring, the io_uring_submit()
> > > > library function passes a flag to the io_uring_enter(2) syscall
> > > > allowing it to skip the ring file descriptor fdget()/fdput()
> > > > operations. This saves some CPU cycles.
> > > > 
> > > > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > > > ---
> > > >  block/io_uring.c | 10 +++++++++-
> > > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/block/io_uring.c b/block/io_uring.c
> > > > index 782afdb433..5247fb79e2 100644
> > > > --- a/block/io_uring.c
> > > > +++ b/block/io_uring.c
> > > > @@ -435,8 +435,16 @@ LuringState *luring_init(Error **errp)
> > > >      }
> > > >  
> > > >      ioq_init(&s->io_q);
> > > > -    return s;
> > > > +    if (io_uring_register_ring_fd(&s->ring) < 0) {
> > > > +        /*
> > > > +         * Only warn about this error: we will fallback to the non-optimized
> > > > +         * io_uring operations.
> > > > +         */
> > > > +        error_reportf_err(*errp,
> > > > +                         "failed to register linux io_uring ring file descriptor");
> > > 
> > > IIUC errp can be NULL, so let's not dereference it without checking. So, just
> > > use error_report?
> > 
> > Plenty of people will be running kernels that lack the new feature,
> > so this "failure" will be an expected scenario. We shouldn't be
> > spamming the logs with any error or warning message. Assuming  QEMU
> > remains fully functional, merely not as optimized, we should be
> > totally silent.
> 
> Functionally, that's a very valid point. But performance wise, is it good to
> have some visibility of this? Since people use io_uring instead of other
> options almost certainly for performance, and here the issue does matter quite
> a bit.

IMHO what you describe is largely a documentation issue, and/or something
for OS vendors to worry about if they want to maximise their users'
performance. As long as io_uring is fully functional we shouldn't print
errors on every QEMU startup, as it leads to pointless bug reports/support
escalations about something that is operating normally, wasting users and
vendors' time.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [PATCH v4] Use io_uring_register_ring_fd() to skip fd operations
Posted by Stefan Hajnoczi 1 year, 12 months ago
On Fri, Apr 22, 2022 at 11:08:39AM +0100, Daniel P. Berrangé wrote:
> On Fri, Apr 22, 2022 at 11:00:47AM +0100, Fam Zheng wrote:
> > On 2022-04-22 09:52, Daniel P. Berrangé wrote:
> > > On Fri, Apr 22, 2022 at 09:34:28AM +0100, Fam Zheng wrote:
> > > > On 2022-04-22 00:36, Sam Li wrote:
> > > > > Linux recently added a new io_uring(7) optimization API that QEMU
> > > > > doesn't take advantage of yet. The liburing library that QEMU uses
> > > > > has added a corresponding new API calling io_uring_register_ring_fd().
> > > > > When this API is called after creating the ring, the io_uring_submit()
> > > > > library function passes a flag to the io_uring_enter(2) syscall
> > > > > allowing it to skip the ring file descriptor fdget()/fdput()
> > > > > operations. This saves some CPU cycles.
> > > > > 
> > > > > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > > > > ---
> > > > >  block/io_uring.c | 10 +++++++++-
> > > > >  1 file changed, 9 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/block/io_uring.c b/block/io_uring.c
> > > > > index 782afdb433..5247fb79e2 100644
> > > > > --- a/block/io_uring.c
> > > > > +++ b/block/io_uring.c
> > > > > @@ -435,8 +435,16 @@ LuringState *luring_init(Error **errp)
> > > > >      }
> > > > >  
> > > > >      ioq_init(&s->io_q);
> > > > > -    return s;
> > > > > +    if (io_uring_register_ring_fd(&s->ring) < 0) {
> > > > > +        /*
> > > > > +         * Only warn about this error: we will fallback to the non-optimized
> > > > > +         * io_uring operations.
> > > > > +         */
> > > > > +        error_reportf_err(*errp,
> > > > > +                         "failed to register linux io_uring ring file descriptor");
> > > > 
> > > > IIUC errp can be NULL, so let's not dereference it without checking. So, just
> > > > use error_report?
> > > 
> > > Plenty of people will be running kernels that lack the new feature,
> > > so this "failure" will be an expected scenario. We shouldn't be
> > > spamming the logs with any error or warning message. Assuming  QEMU
> > > remains fully functional, merely not as optimized, we should be
> > > totally silent.
> > 
> > Functionally, that's a very valid point. But performance wise, is it good to
> > have some visibility of this? Since people use io_uring instead of other
> > options almost certainly for performance, and here the issue does matter quite
> > a bit.
> 
> IMHO what you describe is largely a documentation issue, and/or something
> for OS vendors to worry about if they want to maximise their users'
> performance. As long as io_uring is fully functional we shouldn't print
> errors on every QEMU startup, as it leads to pointless bug reports/support
> escalations about something that is operating normally, wasting users and
> vendors' time.

Also, this is a minor optimization. It's nice to save a CPU cycles when
possible, but it's probably not significant enough that users would
bother to upgrade their kernel.

I think no warning is necessary.

Stefan